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1 Title of Invention Method to utilize 5' ends of transcribed re 
gions for cloning and analysis 

2 Claims 

1 . A method to prepare nucleic acids tags corresponding to the 
5* end of transcribed regions said mRNA. 

2. A method according to claim 1 where concatamers of such 5' end 
tags are produced. 

[^^^2] A method in which such 5* end specific sequence tags d 
erived from transcribed regions said mRNA are analyzed by sequencing. 

[if^^3] The method for preparing concatamers of a plurality of 
at least two or more nucleic acid fragments having information on nucle 
otide sequences of 5' end regions of a plurality of nucleic acids relate 
d to transcribed in a sample, comprising 

A first step of selectively collecting a plurality of cDNAs containing r 
egions complementary to 5* -end regions of mRNAs, which cDNAs are formed 
by using RNA or mRNAs derived from a biological sample or in vitro S3nith 
esized RNA derived from cDNA - or tag - libraries in the sample as temp 
lates; 

A second step to collecting fragments containing cDNA regions including 
at least the regions corresponding to the 5' -end regions of said mRNAs o 
r cDNA; 

And a third step of creating a concatamer of such 5* end nucleic tags. 

[^^^4] The method as in claim 4 but in which 
The first step is substituting the cap-atructure of mRNAs with an oligon 
ucleotide; 

The second step constitute in the formation of full-length cDNA; 

Hie third step involve cleavage of a 5' end tag and formation of concata 

mers. 
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[ff^^5] The method according to claim 4, wherein said first st 
ep comprises the steps of S3mthesizing the first-strand cDNAs using mRNA 
s as templates; attaching a selective binding substance to the cap struc 
tures of said mRNAs; cleaving single-stranded RNAs; binding said selecti 
ve binding substance to a corresponding selective binding substance immo 
bilized on a support, which corresponding selective binding substance se 
lectively binds to said selective binding substance; and recovering said 



[iS^^6] A method as in claim 4 ^ere the first step to isolate 
the full-length cDNA includes an RNase digestion step followed by treat 
ment with an immobilized cap-binding substance followed by eluting such 
full-length cDNAs. 

[^^^ 7 ] A method to add a sequence connected to the 5' end a 
nucleic acids corresponding to the 5' terminal part of a transcript, whe 
n such that can be recognized by a substance that is capable of cleaving 
such nucleic acids outside the recognition sequence. 

IW^^8] The method according to claim 4, wherein said selectiv 
e binding substance is biotin, and said corresponding selective binding 
substance is avidin, streptavidin or an avidin or streptavidin derivativ 
e which specifically binds to biotin- 

[^^^9] The method according to the claim 4 where the selectiv 
e binding substance is digoxigenin and said corresponding binding substa 
nee is an antibody directed against digoxigenin. 

[S^^l 0] A method according to claim 4 or 9, wherein a select 
ive binding substance is bound to a corresponding selective binding subs 
tance which is immobilized on to a support, and where such a support is 
made of magnetic beads, agarose beads, or latex beads 

[^^^ 11] The method according to any one of claims 4 and 6 to 
11, wherein said second step comprises the steps of binding a linker ha 



cDNA. 
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ving at least a restriction site for a substance that cleaves DNA outsid 
e its recognition sequence in the end region corresponding to the 5* end 

of said nucleic acids corresponding to the 5' end of genes, and a rando 
m oligomer region at the 3' end region; synthesizing a second-strand cDN 
A using said linker or other oligonucleotides partially or totally corre 
sponding to the linker as a primer and said cDNA as a template; treating 

the obtained linker-bound double-stranded cDNA with said restriction en 
zyme; and selectively recovering fragments yielded by cleavage by the re 
strict ion enzyme, which fragments contain said linker moieties and part 
of 5' end cDNA. 

[W^^l 2] The method according to any of claims 4 to 12, where 
in a selective binding substance is attached to said linker; and the ste 
p of selectively recovering said fragments containing said linker moieti 
es comprises the steps of binding said selective binding substance to a 
corresponding selective binding substance immobilized on a support, whic 
h corresponding selective binding substance selectively binds to said se 
lective binding substance; and recovering said siipport. 

[if 1 3 1 TTie method according to any of claims 4 to 13, where 
in said selective binding substance is biotin, and said corresponding se 
lective binding substance is avidin, streptavidin, or an avidin derivati 
ve or derivatives of streptavidin which specifically binds to biotin. 

[if >l^^ 14] Hie method according to any of claims 4 to 13 where 
the selective binding substance is digoxigenin and said corresponding bi 
nding substance is an antibody directed against digoxigenin. 

[®^^ 15] Hie method according to any one of claims 4 to 15, w 
herein said restriction enzyme is a substance with a enzymatic activity 
to recognize nucleic acid and to cleave at a site different form the rec 
ognition site. 

[tt^^ 16] The method according to any one of claims 4 to 16, w 
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herein said restriction enzyme is a class II restriction enzyme like Gs 
u I , Mmel, Bpm I or Bsg I 

[W^^ 1 7] TTie method using nucleic acid fragments obtained acc 
ording to any one of claims 1 to 18, for further comprising the steps of 
cloning into concatamer. 

lif^^l 8] A method for determining nucleotide sequences of 5*- 
end regions of a plurality of mRNAs by sequencing said concatemer prepar 
ed by the method according to any one of claims 1 to 18. 

[ft^^l 9] A method, which is the same method according to any 
one of claim 1 to 18, except that preliminarily obtained cDNAs having co 
mplete length is used instead of carrying out said first step. 

[W^^2 0] A method to produce 5* end nucleic acids tags corres 
ponding to the 5* ends of mRNA, in which a mixture of RNA molecules is p 
repared from a preexisting full-length cDNA library and the obtained RNA 
carries at the 5' end of the RNAs a sequence cleavable by a substance a 
ble to recognize a nucleic acids and cleave outside its recognition sequ 
ence. 

[if^3g2 1] A method to produce 5' end nucleic acids tags corres 
ponding to the 5' ends of mRNA, in which a mixture of nucleic acids TAG 
molecules is prepared from a preexisting full-length cDNA library carryi 
ng close to the 5* end of a sequence cleavable by a substance able to re 
cognize a nucleic acids and cleave outside its recognition sequence, whi 
ch is used to produce a nucleic acid TAG molecule. 

[W^^ 2 2] The concatemer prepared by the method according to a 
ny one of claims! to 22. 

[ff^^2 3] A vector comprising said concatemer according to cla 
im 23. 

[^^^2 4] A sequence, which is derived from a concatemer prepa 
red by the method according to any one of claims 1 to 22. 



tBaE#2 003-3072107 




2002-171851 



5/ 



111^^2 5] A method based on any of claims 1 to 22, which allow 
s to determine the transcriptional status of a given cell and therefore 
the transcriptional networking. • ' 

[Iff ^^2 6] A method, which is the same method according to any 
one of claims 1 to 22 to obtain e:q)ression data on a plurality of mRNA o 
r cDNA in a sample. 

[|ff^^2 7] A method, which is the same method according to any 
one of claims 1 to 22, to quantify expression data on a plurality of mRN 
A in a sanq[)le. 

[if ^^2 8] A method, which uses sequence information obtained f 
rom concatemers prepared by a method according to any one of claims 1 to 
22, is used to build a database holding sequence information derived fr 
om the concatemers. 

[iff^^2 9] A method, which is the same method according to any 
one of claims 1 to 22, to identify open reading frames in a genomic sequ 
ence said genome. 

[^^^ 3 0] A method, which is the same method according to any 
one of claims 1 to 22, to identify start sites of transcription and regu 
latory sequences upstream of the start site of transcription in a genomi 
c sequence said genome. 

[iS^^S 1 1 A method, which uses sequence information obtained f 
rom concatemers prepared by a method according to any one of claims 1 to 
22, to clone a full-length or partial cDNA from a plurality of nucleic 
acids. 

[ffi^^S 2] A method, which uses sequence information obtained f 
rom concatemers prepared by a method according to any one of claims 1 to 
22, to analyze the activity of regulatory regions in a genome said prom 



[^^^3 3] A method, which uses sequence information obtained f 
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rom concatemers prepared by a method according to any one of claims 1 to 
22, to inactivate a gene. 

lif^^S 4] A method, which uses sequence information obtained f 
rom concatemers prepared by a method according to any one of the claims 
1 to 22, to synthesis nucleotide sequences said linker. 

[w^^3 5] A method, which uses sequence information obtained f 
rom concatemers prepared by a method according to any one of the claims 
1 to 22, to synthesize nucleotide sequences said primers. 

[fS^^S 6] A method, which uses sequence information obtained f 
rom concatemers prepared by a method according to any one of the claims 
1 to 22, to obtain extended nucleotide sequences derived from the 5' -end 
s of transcripts said sequencing. 

[ii^^S 7] A method according to any one of claims 1 to 8, wher 
ein a single stranded cDNA is ligated two a double stranded S3nithetic ol 
igonucleotide said linker, wherein the linker has a single stranded over 
hang encompassing a nucleotide sequence said tag, which was obtained fro 
m concatemers prepared by a method according to any one of the claims 1 
to 13, wherein the linker is attached to a selective binding substance a 
nd the selective binding substance is attached to a corresponding select 
ive binding substance said support, and where such linker bound to the s 
upport is used to enrich a specific nucleotide sequence said l^t strand 
cDNA said RNA transcript. 

liS^^S 8] A method according to any one of claims 1 to 8, wher 
ein a single stranded cDNA is ligated two a double stranded linker said 
primer, where a selective binding substance is attached to said linker, 
and where selectively binding substance is attached to a corresponding s 
elective binding substance said support, and where such DNA template is 
used to obtain the nucleotide sequences of the 5* -region of an initial 
transcript said RNA. 
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[if 9] A method based on any of the claims 1-39 to be used 

for the development of diagnostic tools, 
3 Detailed Description of Invention 
[0 0 0 1] 

The present invention relates to a method to selectively collect multi 
pie nucleic acid fragments containing information on the nucleotide sequ 
ences at the 5' end site of multiple mRNAs within a sample. The method o 
f the present invention is effective for analyzing the mRNAs contained w 
ithin the sample, for discovering new genes, and for studies on gene reg 
ulation. 

[0 0 0 2] 

To utilize genomic information parts of the genome are transcribed int 
o mRNA. For the understanding of the genome and its use in regulatory pr 
ocesses, information on individual mRNA species is required, which shoul 
d include their partial or full-length nucleotide sequence and their rel 
ative or absolute quantity in a given biological context. 
[0 0 0 3] 

Conventionally, the base sequences in mRNAs contained in a cell or tis 
sue sample had been analyzed by preparing a cDNA library by reverse tran 
script ion, using mRNAs as templates and investigating the individual ins 
ert cDNA fragments within said cDNA library. Since a sample contains a 1 
arge number of varied miRNAs, the conventional method is of limited effi 
ciency to analyze gene expression profiles and to identify rare genes. T 
herefore other technologies have been invented to monitor the expression 
patterns of mRNA in conrplex samples and to identify genes by short sequ 
ence elements said tags. 
[0 0 0 4] 
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Hi^- throughput expression profiling is commonly performed by the use 
of so-called DNA microarrays (Jordan B. , DNA Microarrays: Gene E:q)ressio 
n Applications, Springer-Verlag, Berlin Heidelberg New York, 2001: Schen 
a A, DNA Microarrays, A Practical Approach, Oxford University Press, Qxf 
ord 1999)- For such experiments specific probes representing individual 
genes or transcripts are placed on a support and simultaneously hybridiz 
ed with a plurality of samples. Positive signals will be obtained where 
a probe on the support reacts with a molecule presented with the sanple. 
These e25)eriments allow the parallel analysis of a large number of gene 
s or transcripts. However, the approach is limited to the fact that only 
genes or transcripts can be studied, which were initially identified by 
other experimental means. Such means can include cDNA libraries, partia 
1 sequence tags and/or results obtained from computer predictions. Due t 
0 the limitations of DNA microarray experiments alternative approaches a 
re in use for gene discovery and expression profiling, which are based o 
n partial sequences said tags obtained from a plurality of mRNA sans)les. 
[0 0 0 5] 

The so-called SAGE (Serial Analysis of Gene Expression) method is know 
n as an efficient method of obtaining partial information on the base se 
quences in mRNAs (Velculescu V.E. et at.. Science 270, 484-487 (1995)). 
This method forms DNA concatamers by ligating multiple short DNA fragmen 
ts (about 10 bp) containing information on the base sequences at the 3* 
end site of multiple mRNAs, and determines the base sequences in these D 
NA concatamers. It is a method for finding out partial information on th 
e base sequences at the 3* end site of multiple mRNAs. When only a short 
base sequence close to the 3' end is available but the mRNAs itself is 
already known, the SAGE method can often identify the mRNA, although th 
e available base sequence is as short as about 10 bp. This method is cur 
rently in wide use as an important method for analyzing genes expressed 
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in specific cells or tissues. 
[0 0 0 6] 

mmi)m^ Lxdtt^ mm) 

Tlfhile the SAGE method can be used to learn a partial base sequence at 
the 3' end site of mRNAs, it is difficult to clone new genes based on th 
e information in such short sequences at the 3* end site alone. Despite 
the application, SAGE does not teach how to obtain cDNA clones close to 
the 5' end of the cDNA. In fact, 4 bp restriction enzymes of class lis a 
re used. A 4bp cutter usually cleaves on average a few hundred nucleotid 
es, which is on average 

l/lQth of the average size of an mRNA transcript 
. Thus SAGE principles strongly suggest that 3* ends are collected with 
high prevalence, and no information can be collected about the 5' end fo 
r most of the transcript. In addition 10 bp tags have often been insuff 
icient for specific gene identification and mapping to genomic sequences 

said entire or partial genomes. Therefore, the 10 bp tags are used to i 
dent if y only a "sage-tag", which comprises a part of a mRNA. Notice that 

mammalian mRNA cong)rises only 3-5% of the transcribed part of the mamma 
lian genome and the specific "sage-tag" comprises a subfraction of this 
3-5%, which lies in proximity of the class lis restriction enzyme used i 
n the analysis. Since a 4 bp restriction enzyme cuts approximately a ran 
dom sequence every 4^ bp (256 bp), the "sage-tags" can represent approxi 
mately 1/256 of the 3-5% expressed fraction of the genome (calculation=l 
ess than 0.02%). Therefore, the SAGE techniques teach that essentially i 
t is not possible to use SAGE- tags to analyze a genome but only a very 1 
imited fraction of it. 



Accordingly, the invention claimed in this application aims to provide 
new means that not only enables the acquisition of information on the b 
ase sequences of 5' -ends in mRNAs within a sample, but also enables the 



[0 0 0 71 
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cloning of new genes and the analysis of genomic sequence information, w 
hich correspond to coding and regulatory regions. 
[0 0 0 8] 

This can include statistics on the DNA transcriptional starting site. 
By using concatamers to obtain information on a large number of 5'-seque 
nee tags as presented in the invention, it is possible to effectively ma 
p transcriptional start sites and the related the promoter sequences. Th 
us the invention provides new means, where SAGE did not allow any prompt 
er analysis due to the use of unrelated 3* -ends. At the same time, there 

were techniques for the collection of full-length cDNA clones and seque 
nces derived thereof; however, those are focusing on collecting the full 
-length cDNA clones and not fragments covering the 5' -ends. Therefore fu 
11-length cDNA cloning approaches are not suitable for high throughput i 
dentif ication and analysis of start sites of transcription and the relat 
ed promoter regions. The invention offers here a novel way to combine co 
ntrasting teachings and to obtain by a high throughput approach 5' ends, 

which are useful for promoter mapping and analysis. Hie use of the inve 
ntion to study and analyze conplex regulatory networks in combination wi 
th the ability to identify and clone new genes opens a wide area of appl 
ications for the invention to monitor biological systems and their statu 
s in development, homeostasis, and disease. 



After devoted research, the inventors involved in this application wer 
e able to coni)lete the present invention by arriving at the fact that by 

selectively collecting multiple nucleic acid fragments containing infor 
mat ion on the base sequences at the 5' end site of the mRNAs, it is not 
only possible to acquire information on the base sequences in mRNAs, but 

it is also possible to clone new genes; and they were also able to arri 
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ve at a concrete method for attaining this goal. 
[0 0 10] 



That is, the present invention provides a method for preparing concate 
mers of a plurality of nucleic acid fragments having information on nucl 
eotide sequences of 5* -end regions of a plurality of mRNAs in a sample, 
comprising a first step of selectively collecting a plurality of first-s 
trand cDNAs containing regions complementary to 5' -end regions of mRNAs, 

which cDNAs are formed by using mRNAs in the sample as templates; a sec 
ond step of selectively collecting fragments containing cDNA regions inc 
luding at least the regions complementary to the 5' -end regions of said 
mRNAs; and a third step of ligating the collected fragments to form a co 
ncatemer. The present invention also provides a method for determining n 
ucleotide sequences of 5' -end regions of a plurality of mRNAs by sequenc 
ing said concatemer prepared by the method according to the present inve 
ntion. The present invention further provides a method, which is the sa 
me method according to any one of claim 1 to 10, except that preliminari 
ly obtained cDNAs having coniplete length is used instead of carrying out 

said first step. The present invention still further provides the cone 
atemer prepared by the method according to the present invention. The p 
resent invention still further provides a vector conprising said concate 
mer according the present invention. The present invention still further 

provides sequence tags derived form said concatemers prepared according 

to the present invention. The present invention still further provides 
means to use the sequences derived from said concatemers to analyze the 
content of the plurality of a RNA saiqple. Hie present invention still fu 
rther provides means to use the sequences derived from said concatemers 
to identify regions in the genome, which are required for gene regulatio 
n and gene expression. 



[0 0 11] 
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The invention is not limited to the use of concatamers for sequencing 
of 5' ends, but modifications at particular steps of the enrichment of 5 
' ends and their cloning as disclosed here allow for the individual sequ 
encing of specific 5' ends. Such embodiments of the invention would incl 
ude a modification of the first and second step,, where a linker would be 
used that is specifically bound to a solid matrix. The cDNA bound to th 
e support would then be used to prepare the sequencing reactions. 



Thus the inventions refers more generally to the concept of isolating 
portions of nucleic acids corresponding to the 5' end of transcribed gen 
es and using them to further high-throughput analysis such as sequencing 



As described above, the method of the present invention can comprise b 
ut is not limited to roughly three steps each of which further comprises 
a plurality of steps. Each step will now be explained below. The concre 
te working examples of each step is described in detail in the later-men 
tioned working examples. 



Step 1 is a step to selectively collect nucleic acids said cDNAs conta 
ining a site corresponding to the 5* end site of mRNAs within a sample a 
nd which are synthesized for instance by using said mRNAs as templates. 
[0 0 15] 

Either total RNA or mRNA taken from a desired cell or tissue can be us 
ed as the starting substrate. The preparation method of total RNA and mR 
NA is already known, and it is also described in detail in the later-men 
tioned working exanples. In other embodiments, a full-length cDNA librar 



[0 0 12] 



[0 0 13] 



[0 0 14] 



Step 1 
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y may be used to isolate the 5* end nucleic acids corresponding to the 5 
end of the transcribed part of the genes. Alternatively, a cDNA librar 
y itself would be cleaved if it carries a Class lis enzyme in proximity 
of the 5' end. 

[0 0 16] 

Step 1 itself can be conducted by a publicly known method. In other wo 
rds, methods to construct full-length cDNAs and methods to synthesize cD 
NA fragments at least containing a site corresponding to the 5' end site 
of the mRNAs are already known, and any of these methods can be adopted 
. One of the preferable methods is the cap trapper method (e.g. Piero CA 
RNINCI et al., METHODS IN ENZYMOLOGY, VOL. 303, pp. 19-44, 1999). This c 
ap trapper method shall be explained below, however, the invention is no 
t limited to the use of the c^ trapper method and other approaches to e 
nrich or select full-length cDNAs could be applied as well. An alternati 
ve method (as described by Pelletier et al. in 1995) makes use of an imm 
obilized cap-binding protein to isolate full-length cDNAs after RNase tr 
eatment of a hybrid. 
[0 0 17] 

Alternatively to the cap-selection, one could dephosphorylate with a p 
hosphatase, such as BAP (bacterial alkaline phosphatase) the 5' ends of 
mRNAs, followed by treatment with the decapping enzyme TAP (tobacco acid 

pyrophosphatase). Subsequently a ribonucleotide or a deoxyrubonucleotid 
e can be attached to the 5* end of the mRNA instead of the original cap- 
structure with RNA ligase (Maruyama and Sugano) In this way, for instanc 
e, a Class II S recognition site could be placed on the oligonucleotides 
/ribonucleotide sequence using during the ligation step, which is placed 

at the 5' end of a cDNA or RNA. This class II s restriction enzyme can 

then cleave the cDNA and produce the 5' end tag- 
[0 0 18] 
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Alternatively to biotin, a cap-binding protein (Pelletier et al. Mol C 
ell Biol 1995 15:3363-71) or an antibody that specifically binds to the 
cap structure can be used as the aforementioned selectively binding subs 
tance. 



Alternatively, one could use methods to attach oligonucleotides chemic 
ally to the cap structure as described by Genset. This method is based o 
n the oxidation of cap (US patent 6,022,715). This allows (l) adding to 
the cap an oligonucleotides, which may contain the ClassIIs enzyme, (2) 
preparing first-strand cDNA synthesis which then switched second strand 
cDNA S5nithesis; after the second strand sjmthesis, the cDNA would be cle 
aved with Class II s enzymes to make a 5* tag, for subsequent formation 
of the concatamer. 
[0 0 2 01 

Alternatively, one could use the Use the cap-switch method as describe 
d by Clontech (US patent 5,962,272). One could prepare the first-strand 
cDNA in presence of a cap-switch oligonucleotide, ^ich carries a recogn 
ition site for a substance capable to recognize nucleic acids and cleave 

them apart from the said recognition sequence such a Class II s restric 
tion enz3nne site. The cap switch mechanism let the first strand sequence 

continue on the cap-switch oligonucleotides. This can be followed by se 
cond cDNA strand, possibly also followed by PGR (as describes for instan 
ce in the SMART™ Clontech cloning system), and finally it would be clea 
ved with the class II s to produce the 5' end TAGS. 
[0 0 2 1] 

In another embodiment, when the quality of RNA allows it, one can prep 
are the cDNA by priming and extending the RNA until the cap-structure. P 
articular enzyme and reaction condition allow sometimes reaching the cap 
-site very efficiently (Caminci et al, Biotechniques, 2002). Even witho 
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ut a cap-selection it is possible to attach oligonucleotides in place of 

the cap structure, \thich carry Class lis restriction enzyme sites that 
would be later used to produce concatamers. 
[0 0 2 2] 

The cap trapper method first synthesizes the first-strand cDNA with a 
reverse transcriptase by using RNA as a tenplate. This can be conducted 
by a known method. The cDNA can be primed with an oligo-dT primer or, wh 
en the template RNA is mRNA, it can be primed with a random primer. It i 
s advisable to add trehalose to the reactive solution because it raises 
the efficiency of reverse transcription reaction by stabilizing the reve 
rse transcriptase. It is preferable to use 5-methyl-dCTP instead of stan 
dard dCTP, because it avoids internal cDNA cleavage with several restric 
tion enzymes and prevents unintended cleavage with restriction enzymes t 
0 a considerable extent. In addition, after the first-strand cDNA synthe 
sis, proteins and digested peptides might be removed by CTAB (cetyl trim 
ethyl ammonium bromide) treatment, or other more general methods to puri 
fy cDNA, 

[0 0 2 3] 

Next, a selectively binding substance is bound to the cap structure of 
miRNA. A "selectively binding substance" here means a substance that sel 
ectively binds to a specific substance, preferably but not limited to bi 
otin. The cap structure is the structure at the 5' end of mRNA, which do 
es not exist in transfer RNA (tRNA) or ribosomal RNA (rRNA). Therefore, 
even if total RNA was used as the starting substrate, the selectively bi 
nding substance only binds to mRNA, In addition, the selectively binding 

substance does not bind to mRNA if the cap structure at the 5' end has 
been cleaved. Biotin can be bound to the cap structure by a known method 
. For instance, the cap structure can be biotinylated by first oxidizing 
the diol groups on the cap structure by treating mRNA with an oxidizer 
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such as NaI04 and making then react with biotin hydrazide. Alternatively 
, any other methods known to a person trained in the state of the art of 
the preparation of full-length cDNAs can be utilized to selectively enr 
ich 5* -ends according to the invention. 



Then, single-strand RNA is cleaved by means such as RNase I treatment. 

Any other RNase that can cleave single strand RNA but not cDNA/RNA hybr 
ids or cocktails of RNAses that can cleave the various single-strand RNA 
s sequences at various specificity can be used alternatively. In an RNA/ 
cDNA hybrid whose first-strand cDNA has not extended to the site corresp 
onding to the 5* end site of RNA, the vicinity of the 5' end of RNA is s 
ingle-stranded due to its failure to be hybridized with cDNA. Thus, the 
hybrid is cleaved at the single-stranded part and loses its cap structur 
e through this step. Consequently, this step leaves only those mRNA/cDNA 

hybrids with cDNA that fully extends to the 5* end of mRNA to maintain 
the cap structure. 



A matching selectively binding substance fixed to a support, which sel 
ectively binds to the aforementioned selectively binding substance, is p 
repared. In the present specification, a ''matching selectively binding s 
ubstance" means a substance that selectively binds to the aforementioned 
selectively binding substance, which, in the case where the selectively 
binding substance is biotin, would be avidin, streptavidin or a deriva 
tive thereof that binds specifically to biotin or its derivatives. The s 
upport can favorably be, but is not limited to be, magnetic beads, parti 
cularly magnetic porous glass beads. Since magnetic porous glass beads t 
o which streptavidin has been fixed are commercially available, such com 
mercial streptavidin-f ixed magnetic porous glass beads can be used favor 
ably. Similarly other materials such as latex beads, latex magnetic bead 
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s, agarose beads, polystyrene beads, sepharose beads or alike could be u 
sed instead of porous glass beads. Furthermore, the invention is not lim 
ited to the use the biotion-avidin system but other binding substances c 
ould be used like a digoxygenin tag that would be attached to the cap st 
ructure and digoxygenin recognizing antibodies attached to a solid matri 
X . 

[0 0 2 6] 

Following this, the aforementioned mRNA/cDNA hybrid with the cap struc 
ture is made to react with the aforementioned matching selectively bindi 
ng substance fixed to the support in order to bind the selectively bindi 
ng substance on the cap structure with the matching selectively binding 
substance on the support, thereby immobilizing the mRNA/cDNA hybrid with 

the cap structure on the support. Mien magnetic beads are used as the s 
upport, the magnetic beads can be quickly collected by applying a magnet 
ic force. As mentioned above, the mRNA/cDNA hybrids that have the cap st 
ructure at this stage are only those with cDNA that fully extends to the 

5' end of mRNA, so cDNAs containing a site coniplementary to the 5' end 
of mRNAs are selectively collected by this step, and Step 1 is completed 
. Meanwhile, in order to prevent non-specific binding to the support, it 

is preferable to treat the support with a large excess of DNA-free tRNA 

for blocking such binding before conducting this reaction. Other substa 
nces that are suitable for blocking the surface are nucleic acids or der 
ivatives, for instance total RNA or oligonucleotides; proteins, for inst 
ance bovine serum albumine; polysaccharides, for instance glycogen, dext 
ran sulphate, heparin or other polysaccharides. Alternatively, hybrid mo 
lecules containing parts of all of the above could be used to mask non-s 
pecif ic binding sites. 



The above focuses on the case where Step 1 is conducted by the c^ tra 
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pper method, but other various. methods can also be used as indicated as 
long as they can selectively collect cDNAs containing a site complementa 
ry to the 5* end site of mRNA, 
[0 0 2 8] 

The following Step 2 selectively collects fragments containing a cDNA 
site that at least contains a site complementary to the 5' end site of m 
RNA. 

[0 0 2 9] 

First, the first-strand cDNA that has been immobilized on the support 
is released. It can be conducted by treating the support with alkali, su 
ch as NaOH. Alternatively to alkali, an enzymatic reaction with RNaseH ( 
which cleaves only the RNA hybridized to DNA) could be used. The alkali 
treatment releases the cDNA from the mRNA/cDNA hybrid, bound to the supp 
ort through the cap on the mRNA and separates the cDNA from the mRNA to 
only leave first-strand cDNA on its own. 



Then, a linker carrying a sequence that can be recognized in a sequenc 
e-specific manner by a substance having an enzymatic activity that cleav 
es the recognized DNA outside the recognition sequence. An exanple of su 
ch substance is a Class lis restriction enzyme. 



In this embodiment, a linker that at least carries a Class lis restric 
tion enzyme site, and a random oligomer part at the 3' end site, is liga 
ted to the end of this first-strand cDNA, which corresponds to the 5' en 
d of the aforementioned mRNA (i.e. the 3' end of the cDNA). For the late 
r cloning of the 5* end sequence tags into concatemeres it is preferable 
but not essential to introduce a second recognition site into the linke 
r, which should be distinct from the aforementioned recognition site use 
d for the e.g. Class lis restriction enzyme. 



[0 0 3 0] 
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[0 0 3 2] 

This can preferably be conducted as follows, by a method using a linke 
r that carries a Class lis restriction enzyme site and a random oligomer 

part (SSIIiS (single strand linker ligation method), Y. Shibata et aL, * 
BioTechniques, Vol. 30, No. 6, pp. 1250-1254, (2001)). The Class lis re 
strict ion enzyme is a restriction enzyme group that causes cleavage at p 
arts other than the recognition site. An example includes but is not lim 
ited to the use of Gsul. Gsul treatment cleaves one of the strands at 16 

bp downstream from the recognition site, and the other strand at 14 bp 
downstream from the recognition site. Another suitable example is Iftnel, 
which cleaves respectively 20 and 18 bases apart its recognition sequenc 
e. The random oligomer part is located at the 3' end site of the linker, 

and though the number of bases is not particularly restricted, the reco 
mmended number is 5 to 9, or more preferably, 5 to 6. The Class lis rest 
riction enzyme site should be located close to the aforementioned random 

oligomer part, so that the cleavage point comes within the cDNA, partic 
ularly relatively within the 5* side of the cDNA (i.e. the 3' side of th 
e template mRNA). The linker should preferably be a linker for double-st 
randed DNA of which the aforementioned random oligomer part protrudes to 

the 3* side and provides the binding end. In addition, it is advisable 

4 

to bind a selectively binding substance such as biotin to the linker in 
advance to facilitate its collection later. 
[0 0 3 3] 

When the aforementioned first-strand cDNA is made to react with such a 
linker, the random oligomer part of the linker hybridizes with the 3' e 
nd site of the first-strand cDNA (i.e. the 5* end site of the template m 
RNA). Next, the second-strand cDNA is synthesized by using this linker a 
s a primer and the first-strand cDNA as a template. This step can be con 
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ducted by a standard method. 
[0 0 3 4] 

Then, the obtained double-strand cDNA is treated with the above Class 
lis restriction enzyme. This step produces a double-strand cDNA fragment 
comprising a linker-derived part and a part derived from the 5' end sit 
e of the cDNA (the 5' end site of the second-strand cDNA). For instance, 
if Gsul were to be used as the Class lis restriction enzyme and if ther 
e were to be a linker designed to locate the restriction site immediatel 
y upstream from the aforementioned random oligomer site, the obtained DN 
A fragment would include a site derived from the site on the 5' end side 
of the second-strand DNA (i.e. the site on the 5' end side of the mRNA) 
of the length of 16 bp (however, the complementary strand is 14 bp). In 
the case of the use of Itoe I the length of the second-strand DNA fragme 
nt should increase to 20 and 18 bp respectively. 
[0 0 3 5] 

Next, such DNA fragments are selectively collected. If a selectively b 
inding substance (e.g. biotin) had been bound to the linker as above, th 
e collection could be conducted similarly to Step 1 by using a support t 
0 which a matching selectively binding substance (e.g. streptavidin) wou 
Id be fixed. This procedure completes Step 2, which selectively collects 
fragments containing a cDNA site, belonging to the first-strand cDNA, w 
hich at least contains a site complementary to the 5' end site of the af 
orementioned mRNA. 
[0 0 3 6] 

The above explains the case where the SSLLM is used for Step 2, but St 
ep 2 can also be carried out by any other method as long as the method c 
an selectively collect fragments containing the 3* end site of the first 
-strand cDNA (the 5' end site of the teaplate mRNA). For instance, it is 
possible to use exonuclease that cleaves the nucleotide in the 5* -3* di 
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rection at a controlled speed. Hie exonuclease treatment of the first-st 
rand cDNA for a prescribed time period leaves a single-strand fragment c 
omprising the 3* end site of the first-strand cDNA (the 5' end site of t 
he template mRNA). It is possible to obtain only the targeted single-str 
and fragments by conducting treatment with a nuclease that only splits d 
ouble-strand fragments. These fragments can be collected, joined with ad 
apters and cloned. 



The subsequent Step 3 forms concatamers by mutually ligating the colle 
cted fragments. Since there are multiple mRNAs and the linker hybridizes 

with the first-strand cDNA at the random oligomer part as above, the ab 
ove method can obtain fragments containing multiple cDNAs derived from m 
ultiple mRNAs within a sample. Step 3 litigates these multiple fragments 

and forms concatamers. Tlie ligation of the cDNA fragments can be carrie 
d out by a standard method, using commercial ligation kits. The ligation 

can be securely conducted but is not limited to a method which first is 

introducing a second linker providing a recognition site for a restrict 
ion enzyme that is distinct form the other recognition sites used at the 

earlier stages, which is then ligating two fragments into di-tags, and 
which is further ligating such ligated di-tag fragments into concatamers 
. The number of ligated fragnents is not restricted, practically any num 
ber above two and preferably about 30. The obtained concatamers are pref 
erably but not limited to be amplified or cloned by a standard method. 
[0 0 3 8] 

Hie concatamers obtained in this way each comprise a site having the s 
ame base sequence (however, uracil in RNA would be thymine in DNA) as th 
at of the 5' end site of the multiple mRNAs within the sample. Although 
it also conqprises a part derived from the linker or linkers, the base se 
quence of the linker or linkers is already known, so the part derived fr 
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om the linker or linkers and the part derived from mRNA can be clearly d 
istinguished by investigating the- base sequence of the concatamer. There 
fore, by determining the base sequence of the obtained concatamer, it is 

possible to find out the base sequences at the 5' end site of multiple 
mI?NAs within the sample. The base sequences of a maximum of 16 or 20 bas 
es at the 5* end site of each mRNA can be learned by the preferable mode 

of using Gsul or Mme I. Information on 16 or 20 bases would be sufficie 
nt for almost definitely identifying the mRNA statistically and to judge 

whether or not it is a new mRNA. In addition, by determining the base s 
equence of the concatamer, it is possible to learn the base sequences at 

the 5* end site of mRNAs for the number of above fragments included in 
the concatamer (preferably 20 to 30), so information on the 5' end site 
of multiple mRNAs can be determined efficiently. Hie analysis of the con 
catamers can be automated by the use of computer software to distinguish 

between sequences derived form the 5' -ends and sequences derived form a 

linker or the linkers. 
[0 0 3 9] 

When a new mRNA exists in a base sequence at the 5' end, the cDNA deri 
ved from the new mRNA can be obtained by conducting RT-PCR, making that 
site the forward primer and oligo-dT the reverse primer. It is also poss 
ible to amplify the mRNA by methods such as NASBA. Accordingly, the meth 
od of the present invention can be used for the cloning of new genes. Si 
milarly, forward primers derived form 5' -end specific information can be 
used to amplify partial or full-length cDNA fragments from exciting cDN 
A libraries. 

[0 0 4 0] 

Wtiile the above method had used mRNA or total RNA within the saiq)le as 
the starting substrate. Step 1 can be omitted by using an existing full 
-length cDNA library. In this way, information on the base sequences of 
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the 5' end site of multiple cDNAs (i.e. the 5' end site of the mRNAs use 
d as templates for said cDNAs) contained in the full-length cDNA library 
can be efficiently obtained similarly to the above procedure. 
[0 0 4 1] 

In some embodiments it could be desirable to obtain extended sequence 
information from the 5' -ends of transcribed regions. Such extended seque 
nces may allow in specific cases for the identification of start sites o 
f protein synthesis or a better mapping to genomic sequences. As describ 
ed above the invent ion included in Step 2 the 1 igat ion of a 1 inker to th 
e 5* end of a cDNA. Such a linker can be modified by introducing a singl 
e-stranded overhang enconpassing a sequence obtained from a concatamer t 
o bind to and to be ligation to a specific nucleic acid fragment. After 
the ligation the linker can be used to enrich the DNA fragment by attach 
ing the linker to a support from which it could be released after the en 
richment. The linker can further be used as a primer to obtain extended 
sequence information on 5' ends. 
10 0 4 21 

By investigating the base sequences of the concatamers or extended 5'- 
sequences obtained by the present invention, it is not only possible to 
clone new genes as described above, but also possible to investigate the 

expression profiles of genes within the sana)le. Furthermore, the techno 
logy can be used for various purposes such as to map transcription start 

sites in the genome, to map promoter usage patterns, for the analysis o 
f SNPs in promoter regions, for creating gene networks by combining the 
expression analysis with information on promoters, alternative promoter 
usage and the other data, and for selective collection of the promoter s 
ite within fragmented genomic DNA. To select genomic fragments containin 
g promoter sites, a fragment containing the same base sequence as the 5' 

end site of mRNA could be bounded to a support e.g. by using the aforem 
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entioned Biotin system, and hybridized to fragmented genomic DNA. Hybri 
dized genomic DNA fragments could then be separated from a mixture of ge 
nomic fragments by using e.g. streptavidin-f ixed magnetic beads, and clo 
ned under standard conditions. 



Alternatively, one could avoid to make concatamers and use selected 5' 
end tags by ligating a mixture of full-length cDNAs to magnetic beads c 
arrying homogeneous sequence of oligonucleotides, followed by ligation s 
uch as in the SSLIil, second strand cDNA preparation and cleavage with a 
Class Us restriction enzyme. The 5* end specific tag would be anchored 
specifically to the beads and would be used for the specific sequencing 
as done by Lynx therapeutics (US patents 6,352,828; 6,306,597; 6,280,935 
; 6,265,163; 5,695,934). 
[0 0 4 4] 

For instance, oligonucleotides would have a "random part I" , which w 
ill bind to 5' ends of cDNAs; and a code part of the oligonucleotide, w 
hich will be able to "tag" the ligation product. Hie oligonucleotide m 
ay be destroyed by exonuc lease VII if not hybridized with a cDNA. The 
decoder" oligonucleotides would be used to select out the sequence. The 
specific arrays of cDNAs on beads are then arrayed onto a solid surface 
, one per position, followed by parallel sequencing. If you look at 1 ho 
le per 1 bead, you can make arrays of beads having specific oligonucleot 
ides. 



By modifications as the aforementioned approaches for direct sequencin 
g of 5' end the invention provides different means for the general analy 
sis of 5' ends in the form of concatamers or the analysis of individual 
5' ends, which were enriched by means of a 5' end specific selection. 



[0 0 4 3] 
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The present invention will now be described by way of examples thereof 
. It should be noted that the present invention is not restricted to the 

Examples- Hie experiments describe in the Examples can be performed by 
any person experienced in the state of the art of standard techniques in 

the field of Molecular Biology. Unless otherwise defined in the text, t 
he technical terms, abbreviations, and solutions used in the Examples sh 
ould have the same meaning as commonly understood by a person experience 
d to the state of the art in the field of the invention. A general descr 
iption of such terms, abbreviations and solutions can be found in the co 
mmon reagent section in Molecular Cloning (Sambrook and Russel, 2001). A 
11 publications mentioned herein are incorporated into this document by 
reference to be disclosed and to describe the methods and/or materials t 
herein. 

[0 0 4 7] 

Example 1: 

Preparation of total RNA from tissue 

In the literature a variety of different approaches for the preparation 
of RNA have been described, which are known to a person experienced in t 
he state of the art. All such approaches should allow the preparation of 
a plurality of RNA samples derived from biological materials including 
tissues and cells, which are suitable for the invention. Below two such 
procedures are described in detail. 
Buffers and solutions: 

a) Solution D: 4M guanidinium thyocyanate, 25mM sodium citra 
te (pH7. 0), lOOmM 2-mercaptoethanol and 0.5% n-lauryl-sarcosine. 

b) RNase-free CTAB/UREA solution: 1?S CTAB (Sigma), 4M UREA, 
50mM Tris-HCl (PH 7.0). ImM EDTA (pH 8.0). 

c) Water equilibrated phenol as described in Molecular Cloni 
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ng (Sambrook and Russel, 2001). 

Phosphate-buffer saline (PBS) as described in Molecular Cloning (Sambro 
ok and Russel, 2001) 
5 M Sodium chloride 
7 M Guanidium choride 
Rnase free dd-water 

[0 0 4 8] 
Protocol for total RNA preparation 

Dissect the tissue as fast as possible in a cooled dish. 

Roughly evaluate the volume of tissue in a 50 ml falcon tube. The best q 

uantity of tissue is between 0.5-1 g of tissue for 20 ml Solution D 

Add 2 ml of 2M sodium acetate (pH 4.0) and 16 ml of water-equilibrated p 

henol. 

Mix by a vortex. Add 4 ml of chloroform and shake vigorously by your han 
ds and a vortex. Let it stay on ice for 15 min. 
Centrifuge it at 6,000 rpm for 30 min at 4 oc 

Transfer the upper aqueous phase to new tube by pipetting (25 ml) and re 
cover approximately 20 ml thereof. 

Precipitate the RNA from the aqueous phase by adding 1 equal volume of I 
sopropanol (in this case, approximately 20 ml), store on ice for 1 h. 
Centrifuge at 7,500 rpm for 15 min at 4 ^C: RNA is pelleted by centrifug 
at ion. 

The pellet is washed twice with 70% ethanol, each time followed by centr 
ifugation at 7,500 rpm for 2 min, in order to remove the SCN salts. 
CTAB removal of polysaccharides. Selective CTAB precipitation of mRNA is 
performed after complete RNA re-suspension in 4 ml of water. Subsequen 
tly, 1.3 ml of 5 M NaCl is added and the RNA is then selectively precipi 
tated by adding 16 ml of a CTAB/urea solution. 

Centrifuge for 15 min at 7500 rpm (9500 x g), discard the aqueous phase. 
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Resuspend the RNA pellet in 4 ml of 7 M Gunidinum Cloride. 
Re-suspended RNA is finally precipitated by adding 8 ml of ethanoL Incu 
bate on -20OC for 1-2 hours (or longer) and centrifuge for 15 min at 7,5 
OOrpm, 4X:. At the end, wash the pellet with 5 ml of 70% ethanol. 
Centrifuge again at 7,500 rpm for 5 min. 
Discard the supernatant. 

Re-suspend RNA in 500-1000 microL of RNase-free dd-water. 
[0 0 4 9] 

Preparation of a mRNA fraction from total RNA 
Hie mRNA fragtion of total RNA preparations can be isolated by the use 
of commercial kits such as the MACS mRNA isolation kit (Milteny) or pol 
yA-quick (Stratagene) , which provide satisfactory yield of mRNA under th 
e recommended conditions. One cycle of oligo-dT selection of the mRNA is 
sufficient. It is advisable to redissolve the poly-A+ RNA at a hi^ con 
centration of 1 to 2 microG/microL. 
[0 0 5 0] 

Preparation of a plurality of RNA samples from a cDNA library 

Alternatively, a plurality of nucleic acids corresponding to the 5* en 
ds of genes can be obtained from existing cDNA libraries, which were clo 
ned into expression vectors. By standard methods known to a person famil 
iar with the state of the art of molecular biology approaches, from such 
libraries RNA transcripts can be obtained by in vitro transcription rea 
ctions using e.g. a T3, T7 or SP6 RNA polymerase. Such an approach can b 
e performed by first linearization of the plasmid DNA with appropriate r 
estriction endonuc leases. The restriction enzyme can be chosen to allow 
for the transcription of the sense RNA. In the case of libraries obtaine 
d in the vector pFLC III (Caminci P, Shibata Y, Hayatsu N, Itoh M, Shir 
aki T, Hirozane T, Watahiki A, Shibata K, Konno H, Muramatsu M, Hayashiz 
aki Y.) Balanced-size and long-size cloning of full-length, cap-trapped 
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cDNAs into vectors of the novel lambda-FLC family allows enhanced gene d 
iscovery rate and functional analysis. Genomics, 2001 Sep ; 77 (1-2) : 79-90) 
, the vector can be linearized by cleavage with one of the homing endonu 
cleases I-Ceu I or Pl-Sce I to avoided a truncation of the inserts. For 
the digest mix in a tube 
Plasmid DM 100 microG 
lOx buffer 40 microL 
Restriction enzyme 100 u 

ddH20 ad 400 microL 

Incubate at appropriate temperature for at least 2h and analyze 1 microL 
of the reaction mixture by agarose gel electrophoreses. If the digest i 
s completed, add: 
0. 5 M EDTA 8 microL 
10% SDS 8 microL 

Proteinase K (10 mg/ml) 5 microL 

Incubate for 15 min at 45^C before extracting sample with 500 microL phe 
nol/chloroform. The aqueous phase is to be re-extracted twice with 500 m 
icroL chloroform. Finally linearized DNA is precipitated with isopropano 
1 or ethanol under standard conditions and dissolved in 50 microL TE. 

[0 0 5 1] 
In vitro RNA S3nithesis: 
Mix in a tube under Rnase free conditions: 

Linearized plasmid DNA 20 microG 

5x T7 or T3 buffer 200 microL 

0. 1 M DTT 100 microL 

2 mg/ml BSA 40 microL 

10 mM rNTPs 50 microL 

T7 or T3 RNA polymerase 10 microL 

ddH20 ad 1000 microL 
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Incubate at 370C for 3 to 4 h before adding: 



10 rsM Calcium Chloride 



10 microL 



lU/microL DNase RQl 5 microL 



Incubate at 37^C for 20 min before adding: 



0. 5 M EDTA 



10 microL 



10 mg/ml Protease K 5 microL 



Incubate at 45^0 for 30 min, before addition of Sodium Clilorid to a fina 
1 concentration of IM. Phenol/Chloroform extraction followed be re-extra 
ction with Chloroform should be performed under standard conditions, and 
the RNA transcripts can be finaly collected by Isopropanol or Ethanol p 
recipitation. The pellet is to be resuspended in 200 microL of water or 
TE. The quality of the RNA transcripts should be confirmed by agarose ge 
1 electrophorese and quantification. 



Buffers and solutions 

Saturated Trehalose, about 80% in water (crystals will remain), low meta 
1 content 

4. 9 M high purity sorbitol 
Optionally: Takara GC-Taq buffer 
[0 0 5 3] 

Enzymes and buffers 

RNase H" reverse transcriptase Superscript II (Invitrogen) and buffer or 
other reverse transcriptases. 
[0 0 5 4] 

Nucleic acids and oligonucleotides 

Purified, first-strand oligo-dT primer (Sequence for primer used: 

5* "GAGA(^GAGA(XiATCCITCTGGAGAGimm ) . Al temat ively or a 

dditionally, random primer (dNs-dNg), Tidiere N is any nucleotide. 



[0 0 5 2] 



2. 



: First strand cDNA synthesis 
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mRNA, recommended 2.5 to 25 microG or alternatively, total RNA, 5-50 mic 
roG 

[0 0 5 5] 

Radioactive compounds 

[alpha-32p] dGTP 
[0 0 5 6] 

Protocol A: Trehalose-Sorbitol enhanced 

To prepare the l^t strand cDNA, put together the following 
reagents in three different 0.5 ml PGR tubes (A, B, and C) 
[0 0 5 7] 

Tube A: in a final volume of 21.3 microL, add the following: 

mRNA 2.5-25 microG 

or total RNA, 5-50 microG 

ist strand primer (2 microG/microL) 14 microG (7 microL) 

Total volimie: 22 microL 

Heat the mixture (mRNA, primer) at 65^0 for 10 min to dissolve the secon 
dary structures of mRNA. 

Tube B: in a final volume of 76 microL, add the following: 
5X ist strand buffer 28.6 microL 

0.1 M DTT 11 microL 

dATP, dTTP, dGTP, and 5-methyl-dCTP 10 mM each 9.3 microL 

4.9 M sorbitol 55.4 microL 

Saturated trehalose 23.2 microL 

RNase Superscript II reverse transcriptase (200 U/microL) 

15.0 microL 
Final volimie: 142.5 microL 
[0 0 5 8] 

Prepare a cycle (on a thermal cycle) with: 40^0, 4 min; 50^0, 2 min; 56^ 
C, 60 min. 
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If total RNA is used as the starting material, prepare a cycle with: 
40OC, 2 min, -0. lOC/sec to 350C; 50OC. 2 min; 560C, 60 min. 
Alternatively: prime the cDNA with a random primer (dNg, N= any nucleoti 
de) at 250 C. 

[0 0 5 9] 

Tube C: 



10 0 6 0] 

For a cold-start operate as follows: 
Quickly mix tubes A and B on ice. 

Transfer in tube C 40 microL of the A+B mixture. 

Tubes A+B and C should be quickly transferred immediately at 40OC of the 
step 1 of the above cycling program to anneal at 400C four 4 minutes. 
Let the reaction proceed following the thermal cycler setting. 

[0 0 6 1] 
For a hot-start, operate as follows: 
Transfer the tubes A, B, C on the thermal cycler 
Start the cycling 

IWien the temperature reaches 420C, quickly mix tubes A and B. 
Transfer in tube C 40 microL of the A+B mixture. 
Let the reaction proceed following the thermal cycler setting. 
[0 0 6 2] 

Protocol B: GCI-Trehalose-Sorbitol enhanced 

Tube A: in a final volume of 22 microL, add the following: 

mRNA 5-25 microG 

(precipitate with ethanol and re-suspend directly with the primer) 
or total RNA, up to 50 microG (for the small-scale protocol) 
Purified ist strand cDNA primer (2 microG/microL) 14 microG(7 microL) 
Final volume: 22 microL 



1-1.5 microL of [alpha-32p] dGTP. 
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Tube B: add the following: 

2 X GC I (LA Taq) buffer (TaKaRa) 75microL 

dATP, dTTP, dGTP, and 5-methyl-dCTP, 10 mM each 4 microL 

4,9 M sorbitol 20 microL 

Saturated trehalose (approximately 80%) 10 microL 

Superscript II reverse transcriptase (200 U/microL) 15 microL 

ddH20 4 microL 

Final volume: 128 microL 

Tube C: 

alpha-32p-dGTP 1.5 microL 

For the rest of the procedure, follow exactly the point as 
in the normal reaction condition. Prepare (in advance) a thermal cycler 
with the following cycle: 

420C, 30 min; SO^C, 10 min; 55^0, 10 min; 40C, indefinite time. 

[0 0 6 3] 
Operate as follows: 

1) Transfer the tubes A, B, C on the thermal cycler 

2) Start the cycling 

3) HWien the temperature reaches 42^0, quickly mix tubes 



A and B. 



r setting. 



centration. 



4) Transfer in tube C 40 microL of the A+B mixture. 

5) Let the reaction proceed following the thermal cycle 

At the end, stop the reaction with EDTA at 10 mM final con 



Then incorporation of [alpha32p]GTP is measured and the yi 
eld of cDNA is calculated. Calculation of the amount of cDNA by measuri 
ng [alpha32p]GTP is useful for monitoring whether the processes are accu 
rately proceeding or not. 
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[0 0 6 4] 



3. 



CTAB precipitation of the first-strand cDNA 



Buffers and solutions 

CTAB solution as described in Example 1 
After measuring the radioactivity, transfer both the "hot" and "cold" 1^ 
t strand synthesis (tube B and C) to a tube and perform CTAB precipitati 



Mix the tube B and C from the first strand; to the mixture add: 
3 microL of 0.5 M EDTA (final concentration of 10 mM) 
2 microL of 10 microG/microL Proteinase K. 

Incubate at 450C or SQOC for at least 15 min, and as long as 1 hour. 

To the 128-142 microL volume of the first strand cDNA reaction, add: 

32 microL of 5 M Sodium Chloride (RNase free) 

320 microL of CTAB-Urea solution 

Incubate at roc«n temperature for 10 min. 

Centrifuge at 15,000 rpm for 10 min 

Remove supernatant. 

Carefully re-suspend with 100 microL of 7M Guanidinium Cloride 
Add 250 microL of ethanol and leave on ice or 20 to -SOOC for 30-60 min 
Centrifuge at 15,000 for 10 min. Remove the supernatant. 
Subsequently, wash the pellet twice with 800 microL of 80% ethanol. Each 
time, add 80% ethanol to the tube and centrifuge for 3 min. at 15,000 r 
pm. 

Re-suspend cDNA in water 46 microL. 
10 0 6 5] 

4. Cap-trapping, oxidation and biotinylation of the cap 

Buffers and solutions 

1 M sodium acetate buffer, pH 4.5 

IM citrate buffer, pH 6.0 



on as follows. 
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NaI04, solution >100 mM. 
SDS 10% 

Biotinylation buffer: 33 mM Sodium citrate, pH 6.0, and 0.33% SDS. 

10 mM Biotin Hydrazide long arm (MW = 371.51; 3.71 mg/ml = 
10 mM) in citrate/SDS buffer. 
Cap biotinylation: (A) Oxidation of the diol groups of mRNA 

In a final volume of 50 to 55 microL, add the following: 

The re-suspended cDNA sample 

3-3 microL of 1 M sodium acetate buffer, pH 4.5 

A freshly prepared solution of NaI04 to a final concentration of 10 mM 
Incubate on ice in the dark for 45 min. 
Finally, precipitate the cDNA: 

To sinflplify the downstream process, add 1 microL of glycerol 80%. 
Vortex. 

Add 0.5 microL of 10% SDS, 11 microL of 5 M sodium chloride and 61 micro 
L of isopropanol. 

Incubate at 20 or -80OC for 30 min in the dark. 

Centrifuge for 15 min at 15,000 rpm. 

Remove supernatant. 

Add 500 microL of 80% ethanol 

Centrifuge at 15,000 rpm for 2-3 min. 

Discard the supernatant 

Repeat steps 12-13 

Re-suspend the cDNA in 50 microL of water. 

Biotinylation: (B) Derivatization of the oxidized diol groijps 

To the cDNA (50 microL), add 160 microL of the dissolved biotin hydrazid 

e long arm in the reaction buffer. Perform the reaction in 210 microL (f 

inal volume). 
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Incubate overnight (10-16 hours) at room temperature (22-260C). 
Subsequently, to precipitate the biotinylated cDNA, add: 
75 microL 1 M Sodium citrate, pH 6.1 



5 microL of 5 M Sodium chloride 



750 microL of absolute ethanol 

Incubate on ice for 1 hour or at 80 or -20^C for 30 min or longer- 
Centrifuge the sample at 15,000 rpm for 10 min 
Wash the precipitate twice with 70% or 80% ethanol and centrifuge. 
Discard the supernatant and repeat the wash. Dissolve the cDNA in 175 mi 
croL of TE (1 mM Tris, pH 7,5, 0.1 mM EDTA). 

Cap-trapping and releasing the 5' ends of cDNA Enzjnnes and buffers 
RNase ONE (Promega) and its reaction buffer 

To the cDNA sample add, in a final volume of 200 microL: 
20 microL of RNase I buffer (Promega). 

1 units of RNase I (Promega, 5 or 10 U/microL) per each 1 microG of star 
ting mRNA or total RNA (in case of small scale protocol) used for first 
strand cDNA synthesis. 
Incubate at SJOQ for 30 min. 

To stop the reaction, put the sanple on ice and add 
4 microL 10% SDS and 

3 microL of 10 microG/microL Proteinase K. 
Incubate at 45^C for 15 min. 

Extract once with 1:1 Tris-equilibrated phenol: chloroform, then load the 

aqueous phase into Microcon -100. 
Perform a back extraction with water and load again into the Microcon-Ce 
ntricon 100 filter. 

Perform one round of Microcon separation 

8-b) Dissolve completely the pellet with 20 microL of 0. 1 x TE 
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[0 0 6 61 

Magnetic beads blocking 
Materials 

Streptavidin-coated MPG (CPG inc. , New Jersey) 
Buffers and solutions 

Binding buffer: 4.5 M NaCl, 50 mM EDTA, pH 8.0 
Special equipments 

A magnetic stand to hold 1.5 ml tubes is required. 

To further minimize the non-specific binding of nucleic acids, magnetic 

beads are pre-incubated with DNA-free tRNA (lOmg/ml). 

For each preparation, pre-incubate 500 microL of magnetic beads (per 25 

microG of starting mRNA) with 100 microG of tRNA. 

Incubate on ice for 30 min with occasional mixing. 

Separate the beads with a magnetic stand (for 3 min) and remove the supe 
rnatant. 

Wash for 3 times with 500 microL of binding buffer 

[0 0 6 7] 
5' -ends cDNA capture and release 

To capture the full-length cDNA, mix the RNasel- treated cDNA and wash be 
ads as follows: 



1) 



Re-suspend the beads in 500 microL of wash/binding b 



uffer. 



2) Transfer 350 microL of the beads into the tube conta. 
ining the biotinylated first-strand cDNA. 

3) After mixing gently rotate the tube for 10 min at 50 
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4) 



Transfer 150 microL of the beads into the tube conta 



ining the biotinylated first-strand cDNA and 350 microL of beads. 



Separate the beads from the supernatant on a magnetic stand. 
Washing the beads 

Gently wash the beads with 0.5 ml of the indicated buffer to remove the 
nonspecif ically absorbed cDNAs. 
2 X with washing/binding solution. 

1 X with 0.3 M NaCl/ ImM EDTA 

2 X with 0.4% SDS/ 0.5 M NaOAc/ 20 mM Tris-HCl pH 8.5/ ImM EDTA. 
2 X with 0.5 M NaOAc/ 10 mM Tris-HCl pH 8.5/ ImM EDTA. 

Alkali release (see below) 

Alkali full-length cDNA release from beads 

Add 100 microL of 50 mM NaOH, 5 mM EDTA. 

Briefly stir and incubate 5 min at RT with occasional mixing. 

Separate the magnetic beads and transfer the e luted cDNA on ice. 

Repeat the elution cycle with 100 microL of 50 mM NaOH, 5 mM EDTA, two m 

ore times until most of the cDNA, 80-90% as measured by monitoring the r 

adioactivity, can be recovered from the beads. 

Adding a 5' -end primable site to the cDNA 

RNase step 

Enzymes and buffers 

- RNase ONE™ and its buffer (Promega) 

Add 50 microL of 1 M Tris-HCl, pH 7.0 in tubes on ice and mix quickly. 
Add 1 microL of RNase I (lOU/microL) and mix quickly. 

Incubate at 37 ©C for 10 min. 
To remove the RNasel, treat the cDNA with Proteinase K and phenol/chloro 



5) 



After mixing gently rotate the tube for 20 min at 50 
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form extraction including back extraction- 
Add 3 microG of glycogen. Treat the cDNA with one cycle of Microcon-100. 
Fractionation of cDNA before adding a primable site 
Materials 

Amersham-Pharmacia S-400 spun kit or alternative kits 
Buffers and solutions 

Column buffer: 10 mM Tris, pH 8.0, 1 mM EDTA, 0.1 % SDS, and 100 mM NaCl 
Column buffer without SDS: 10 mM Tris, pH 8.0, 1 mM EDTA and 100 mM NaCl 

S-400 spun column chromatography 

Detailed protocols are described in the kits. This is the running protoc 
ol of S-400 spun columns- 
Shake the column 

Brake the seal and transfer in a 2 ml tube 
Centrifuge at 3,000 rpm 1 min (+ 4 oQ 
Add the cDNA (< 20 microL volume) 
After cDNA, add 80 microL of water 
Centrifuge 2 min at 3000 rpm 

Concentrate by Microcon 100 or precipitate with isopropanoL Recovery s 
hould exceed 80%. 

[0 0 6 81 
6. SSLLM 
Materials 

S-300 spun column chromatography kit (Amersham-Pharmacia) 
Buffers and solutions 

Column buffer: lOmM TrisHCl pH 8.0, ImM EDTA, 0.1% SDS, lOOmM NaCl. 

Enzymes and buffers 

Takara DNA Ligase KIT II. 

Nucleic acids and oligonucleotides 
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In the Example given here, the recognition sites for the restriction enz 
ymes Bgl II, Gsu I and Mme I are introduced, however, the invention is n 
ot dependent or limited to the use of those restriction enzyn^s and thei 
r recognition sites. In particular, Bgl II (recognition site: AGATCT) ca 
n be replaced by any endonuclease suitable for cloning. Other example fo 
r such enzyme could include Asc I (recognition site: GGCGCGCC) or Xba I 
(recognition site: TCTAGA). 

Synthesize the following oligonucleotides containing the Gs 
ul restriction site. 
Oligonucleotide Bg-Gsu-GN5: 

5' -Biotin-AGAGAGAGAAmGGCITAATAGGTGACTAGATCTGGAGGNNNNN-S' ; 
01 igonucleot ide Bg-Gsu-N6 : 

5' -Biotin-AGAGAGAGAACTAGGCTTAATAGGTGACTAGATCTGGAGNN^^ ; 
Oligonucleotide Bg-Gsu-down: 

5' P-aXJGAGATCTAGTCACCTATTAAGCCTAGTTCTCT^ 3' . 

Synthesize the following oligonucleotides containing the M 
e I restriction site. 

01 igonucleot ide Bg-Iifaie-GN5 : 

5' -Biotin-AGAGAGAGAACTAGGCTTAATAGGTGACTAGATCTTCCRACGNN^ ; 
Oligonucleotide Bg-Mme-N6: 

5' -Biotin-AGAGAGAGAAmGGCTrAATAGGTGACTAGATCTTCCRAO^^ ; Oligonuc 
leotide Bg-Mme-down: 

5' P-GTYGGAGATCTAGTCACCTATTAAGCCTAGTrCTCTC^ 3' . 
Where R stands for G or A and Y stands for C or T. 

P means that the oligonucleotide must be 5' phosphorylated and NH2 indie 
ates that an amino-groiq) is added to avoid non-specific ligation and pos 
sible hairpin priming. 

Oligonucleotides should be purified by acrylamide gel electrophoresis fo 
1 lowing standard techniques as the first-strand cDNA primer with 10% acr 
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ylamide electrophoresis (Sambrook and Russel, 2001). Oligonulceotides sh 
ould be extracted with phenol /chloroform, chloroform and precipitation w 
ith 2 volumes of ethanol as for the first-strand cDNA primer, 

[0 0 6 9] 
Preparation of the linkers. 

After OD checking and mixing Bg-Gsu-GN5, Bg-Gsu-N6 and "down" oligonucle 
otides at ratio 4:1:5, at least 2 microG/microL of DNA; add NaCl at 100 
mM final concentration. The oligonulceotides are annealed at 65^0 for 5m 
in, 450c for 5min, 370C for lOmin, 250C for lOmin. 

[0 0 7 0] 
Ligation of the first-strand cDNA 

Use 2 microG of linker mixture for up to 1 microG single-strand cDNA. Mi 
X linkers and cDNA (final volume: 5 microL) 

Heat at 65^0 for 5min to melt secondary structures of single-strand cDNA 
Transfer the linker and cDNA mix on ice. 

Add 5 microL of the solution II from the TAKARA DNA ligation Kit. 
Add 10 microL of solution I of the kit. 
Incubate at lO^C overnight (at least >10 hours). 

At the end of the ligation reaction, stop the reaction by adding ImicroL 
of 0.5 M EDTA, 1 microL of 10% SDS, ImicroL of 10 mg/ml Proteinase K, 1 
0 microL of water, and incubate at 45^C for 15 min. 

Treat with phenol/chloroform, chloroform and back extract (see appendix) 
with 60 microL of column buffer 

After the ligation, remove the excess linker with S-300 spin column chro 
matography 

1) Shake the column several times and then let it stand upright. 

2) Remove the upper cap, then the bottom one. 

3) Drain the buffer of the column. Apply 2 ml of the column buffer and d 
rain twice by gravity. 
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Put the column into a 15 ml centrifuge tube, then centrifuge at 400 x g 
for 2 min in a swing-out rotor at room temperaturfe. 

Apply 100 microL of buffer to the column, then centrifuge at 400 x g for 
2 min. Check the eluted volume. If it is different from the input (100 
microL), repeat this step until the eluted volume is the same as the add 
ed one. 

Set a 1.5 ml tube, after cutting off the cap, into the 15 ml centrifuge 
tube, and then apply the sample into the column. Centrifuge at 400 x g f 
or 2 min. 

Collect the eluted fraction in a separate tube. Apply to the column 50mi 
croL of buffer, repeat the centrifugation and collect the fraction in a 
separate tube. 

Repeat step 6 for 3 to 5 more times; keep the eluted fractions separate. 
Collected fractions should be counted in a scintillation counter. Usuall 
y mix the first 2-3 fractions (80% of cpm of cDNA). 

Add NaCl to a final concentration of 0.2 M, precipitated the cDNA by add 
ing equivalait of isopropanol. 

After precipitation and washing twice with 80% cold ethanol, re-suspend 
with water. 
Second-strand cDNA 

Setting the 2nd strand cDNA program on the thermal cycler as follows: 
Step 1 5 min at 65 OC 

Step 2 30 min at 68 ©C 

Step 3 72 OC for 10 min 

Step 4 +40C 

Procedure for the second strand cDNA 
Second strand steps, mix in a test tube: 



The cDNA 
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6 microL of LA-Taq polymerase buffer (Takara) 
6 microL of 2. 5 mM (each) dNTP* s (Takara) 

0.5 microL of [alpha-32p] dGTP (optional to follow the incorporation) 



After starting the 2nd strand program, put the tube on the thermal cycle 
r. 

Add to tube 3 microL of 5 U/microL of LA Polymerase or alternative therm 
ostabe polymerase cocktails, when the sain)les are at 65<^C, during the fi 
rst step. / 
Mix quickly but thoroughly 

At the end of the cycle of the thermal cycler, stop the reaction by addy 
ing 10 mM EDTA (final concentration) and clean up the reaction by Protei 
nase K treatment. Phenol -chloroform extraction and ethanol precipitation 
(see Sambrook and Russel, 2001, Molecular Cloning, CSHL press, NY). 



[0 0 7 1] 

11. Cleavage of cDNA 

The cDNA should then be cleaved with the Class Us restriction enzyme li 
ke Gsu I given in this Exasaple. 

Buffer (lOX) (MBI Fermentas) 10 microL 

GsuKlU/microL) (use 5U/microG DNA) Y microL 



ddH20 X microL 

Final volume 100 microL 

Where the Y and X vary depending on the quantity of cDNA 

1) Incubate at 37*C for 1 hour. 

2) Added 0.5M EDTA 2 microL. 

3) Incubated at 65TC for 15 min. to inactivate the enzyme 



Prepare the magnetic beads 

Prepare the appropriate quantity of CPG-MPG (Magnetic porous glass beads 
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)• The same considerations made, for the cap-trapper step are valid at th 
is point- 
Prepare 200 microL of GPG- beads. 
Add 5 microG of tRNA (20 mg/ml). 

Incubate at RT for 10-20 min or on ice for 30-60 min, with occasional sh 
aking 

Transfer the beads on a magnetic stand for 3 minutes and remove the aque 
ous phase. 

Wash 3 times with: IM NaCl, 10 mM EDTA use at least a volume equivalent 
to the starting volume of beads. 

Re-suspend beads in IM NaCl, 10 mM EDTA equivalent to the starting volum 
e of beads - 

[0 0 7 2] 

7. Released cDNA tags 
Mixed washed beads and Gsul cut sample. 

Incubate at RT for 15 min with occasional gentle mixing 
Let it stand on magnetic rack for 3 min. 
Recover the supernatant. 

Rinse 4X with 500 microL of IX B&W buffer (binding and washing buffer= 5 
mH Tris, pH 7.5, 0,5 mM EDTA, and 1 M NaCl) containing IX BSA (bovine s 
erum albumin) wash. 

Wash 2X with 200 microL of IX ligase buffer (NEB). 
[0 0 7 3] 

8. Ligating linkers to bound cDNA: II linker ligation. 

In this Example a linker with a recognition site for the restriction enz 
yme Eco RI is used. However, the invention is not dependent or limited t 
o the use of Eco RI in the second linker- Any other restriction enzyme a 
nd its recognition site can be used depending on their convenience for c 
loning the concatamers. 
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Oligonucleotides to be synthesized: 
5' -GAGAGAGAGA(TrTA(XJTGACACTATAGMGA(rrCXnXiA(^ 

5' -P-GMTTCTCAGGAaxnTCTATA(nxm 

The oligonucleotides are purified and annealed as described for the Link 



LoTE (1 mM Tris, pH 7.5, and 0.1 mM EDTA) 20 microL suspended and add li 
nker II (0.4 microG/microL) 

Heat the tube at 65 "C for 5min, then let sit at ro<MB teniperature for 15 
min. 

Add TaKaRa ligation kit II solution II 25microL and solution I 50microL. 
Incubated at 16 overnight. 

After ligation, wash 4 times with 500 microL IX B&W buffer containing IX 
BSA. 

Wash once with 200 microL IX B&W buffer and twice with 200 microL IXBgll 
I buffer containing IX BSA. 
[0 0 7 4] 

Release of cDNA tags using the Tagging Enzyme 
Add to the saiig)le the following 



er 1. 



LoTE 



X microL 



lOX buffer 



10 microL 



Bgl II Y microL 



Make up the volume to a total of 100 microL. 

1) Incubate at 37'C for 1 hour, gently mixing intermittently. 

2) Place on magnet, collect supernatant into new tube. The su 
pematant contains the released 5' end fragmaits. 
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Raise volume to 200 microL with LoTE. 



To 200 microL of sample (the 5* ends, tagged with linkers) add: 

133 microL 7.5M NH40ac 

3 microL ImicroG/microL glycogen 

340 microL Isopropanol 

Incubate at 20 or -80*C for at least 30 min. 

Spin for 20min at 4^0 at 15,000 rpm in a micro-centrifuge. Remove the su 
pematant. Wash the pellet twice with 80% or 70% ethanoL Centrifuge for 
3 min at 15,000 rpm and removed the ethanol wash- At the end, re-suspen 
d in 10 microL LoTE, 

[0 0 7 5] 
Ligating tags to form di-tags 
The 5' ends of cDNAs are ligated to form di--tags. 

1) Add the TaKaRa ligation Kit II solution II 10 microL and s 
olution I 20 microL. 

2) Incubate overnight IS'C. 

3) Added 10 microL of ddH20, 1 microL of 0. 5M EDTA, microL of 
10% SDS 1 and 1 microL of 10 microG/microL Proteinase K- 

4) Incubate at 45^0 for 15min. 

5) Extract once with 1:1 Tris-equilibrated phenol : chloroform 
aqueous phase. After phenol -chloroform and chloroform, and back extract i 
on. 

6) Removal the smallest cDNA fragment with a G-50 spun-column 
(Size exclusion). 

7) precipitate with isopropanol by adding 5 microG of glycoge 
n as carrier. 

100 microL sample 
67 microL 7. 5M NH4OAC 
5 microL glycogen 
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180 microL Isopropanol 

8) Spin for 20 min at 4 "C. 

9) Wash twice with 80% or 70% ethanol, centrifuge and remove . 
the ethanol. 

10 0 7 61 

12. Cleavage of cDNA with anchoring enzyme 

1) Re-suspend the sample in 5 microL of LoTE. Add then in ord 
er: 

LoTE X microL 

lOX EcoRI restriction buffer 5 microL 

EcoRI Y microL (use 20 Units of EcoRI) 

Bring up the volume to a total of 50 microL. 

2) Incubate at 37*0 for Ihour. 

3) Add 1 microL of 0. 5M EDTA. ImicroL of 10% SDS 1 and 1 micr 
oL of 10 microG/microL Proteinase K 10%. 

4) Incubate at 45 for 15min. 

5) Extract once with 1:1 Tris-equilibrated phenol: chloroform 
aqueous phase. After phenol -chloroform and chloroform, and back extract i 
on 

6) precipitate with isopropanol by adding 5 microG of glycoge 
n as carrier. 

100 microL sample 
67 microL 7.5M NH40Ac 
5 microL glycogen 
180 microL Isopropanol 

8) Spin for 20 min at 4*0. 

9) Wash twice with 80% or 70% ethanol, centrifuge and removed 
the ethanol wash each time. 

[0 0 7 7] 
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11. Ligation of di-tags to form concatemers 

1) Resuspended LoTE 5 microL. 

2) Added TaKaRa ligation kit II solution II 5 microL and solu 
tion II 10 microL. 

3) Incubate 1.5 hours at 16 *C. 

4) Added 0.5M EDTA 1 microL, 10% SDS 1 microL, 10 microG/micr 
oL Proteinase K 1 microL. 

5) Incubate at 45^0 for 15min. 

6) Extract once with 1:1 Tris-equilibrated phenol: chloroform 
aqueous phase. After phenol-chloroform and chloroform, and back extract i 
on. 

7) precipitate with isopropanol by adding 5 microG of glycoge 



n as carrier. 
100 microL sample 
67 microL 7.5M NH40Ac 
5 microL glycogen 
180 microL Isopropanol 

8) Spin for 20min at AV. 

9) Wash twice with 80% or 70% ethanol, centrifuge and removed 
• 

Resolved 5 microL ddH20. 

[0 0 7 8] 
Exanple 2: 

The above-obtained concatamers are to be further ligated into a cloning 
vector such as pBlueascript II KS+ (Stratagene) . A large variety of clon 
ing vectors are known in the filed, which can be use for invention. 
Standard Ligation: 

Mix a three time excess of concatamer DNA and 100 ng of an appropriate v 
ector linearized with Eco RI in a volume of 5 microL. Then mix 5 microL 
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of Solution I of DNA Ligation Kit Ver.2 (Takara) to the insert/vector mi 
xture. Incubate the tube at le^C for 12-16 h. 

[0 0 7 9] 
Transformation: 

To remove salt from the ligation solution, precipitate DNA after the a 
ddition of 2 microG of Glycogen (Roche), 20mM Sodium Chloride and 80% et 
hanol. The DNA pellet is washed twice with 150 microL of 80% of ethanol, 
and the pellet is then dissolved in 10 microL of water. Using 1 microL 
of desalted ligation solution, ElectroMAX™ DHIOB™ Cells (Invitrogen) a 
re transformed using Cell-Porator or alike (Biometrer according to the 
transformation procedures described in the manufacturer's manual. Transf 
ormed bacteria are plated on a selective medium and grown overnight. Pos 
itive clones are to be isolated from those plates for further characteri 
zation of the concatamers- 



Example 3: Sequencing of concatemers 

Sequencing of concatamers is performed using primers nested in the f la 
nking regions of the cloning vector and a Bi^ye Terminator Cycle Sequen 
cing Ready Reaction Kit v2,0 (Applied Biosystems) and an ABI3700 (Applie 
d Biosystems) sequencer according to the manufacture's product descripti 
ons. The concatamers are sequenced from both ends to cover their entire 
sequence. 



Example 4: Identification of 5* -end sequence tags 

The sequences obtained form Concatamers are characterized by the struc 
ture of the di~tags as presented in Figure 5. Defined regions holding th 
e recognition sites for the restriction enzymes used during the cloning 
steps flank each 5' end specific sequence tag. Therefore the 5' end spec 
if ic sequence tags can be identified by a manual sequence analysis or by 



[0 0 8 0] 



[0 0 8 1] 



mBEf^2 003-3072107 



M2002-171851 ^ ^-v: 49/ 

an automated process using an appropriate coii5)uter program. Individual 
5' end specific sequence tags can be stored in a computer file or a data 
base system. 

10 0 8 2] 

Exan?)le 5: Characterization of 5' -end sequence tags 

5* end specific sequence tags can be analyzed for their identity by st 
andard software solutions to perform sequence alignments like NCBI BLAST 

(http://www.ncbi.nlm.nih.gov/BLAST/), FASTA, available in the Genetics 
Computer Group (GCG) package from Accelrys Inc. (http://www.accelrys.com 
/) or alike. Such software solutions allow for an alignment of 5' end sp 
ecific sequence tags among one another to identify unique or non~redunda 
nt tags, which can be further used in 
Database searches 

Building a 5' -end sequence database 
Gene identification using a 5' -end sequence database 
An example of a BLAST search in GenBank using a 5* end specific tag is g 
iven below: The 16 bp tag (5'-ACC TCC CTC CGC GGA G) is derived from the 
5' end of Human TGF-bl: JBC 264 (1989) 402-408. 
Query= (16 letters) (ACCTCCCTCCGCGGAG) 

Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, 
GSS, or phase 0, 1 or 2 HTGS sequences) 

1,205,903 sequences; 5,297,768,116 total letters 

Score E 

Sequences producing significant alignments: (bits) 
Value 

gi 1 10863872 1 ref |NM_000660. 1 1 Homo sapiens transforming grow. . . 32 
1.1 
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gill8590091lref |XM_085882.1| Homo sapiens similar to transf... 32 
1.1 

gi|11424057|ref |XM_008912.1| Homo sapiens transforming grow. . . 32 
1.1 

gi|7684381|gb|AC011462.4IAC011462 Homo s^i ens chromosome 1. . . 32 
1.1 

gi|15027087lemb|AL389894.4ILMFLCHR4A Leishmania major Fried. . . 32 
1.1 

gi|1943914|gb|U70540.1|LMU70540 Leishmania mexicana amazone. . . 32 
1.1 

gi 1 37097 1 emb I X05839. 1 1 HSTGFBGl Human transforming growth fa. . . 32 
1.1 

gi 137092 1 emb 1X02812. 1 IHSTGFBl Human mRNA for transforming g. . . 32 
1.1 

gil340526lgb|J04431.1|HlMrGFBlPR Homo sapiens transforming .. . 32 
1.1 

gi 1 18858696 1 ref IN1L131728. 1 1 Danio rerio f orkhead box Cla (. . . 30 
4.2 

gi 1 12004937 1 gb I AF219949. 1 IAF219949 Danio rerio forkhead tra. . . 30 
4.2 

gi 1 193604 Igb IM13366. 1 IMUSGPDX Mouse glycerophosphate dehydr. . . 30 
4.2 

gi 1 193601 1 gb I M25558. 1 1 MUSGPD Mouse glycerol-3-phosphate deh. . . 30 
4.2 

gi 1 63465 1 emb I V00414. 1 1 GGHIOl Gal lus gal lus mRNA coding for ... 30 
4.2 

gi 1 63444 1 emb I X13894. 1 1 GGH2AF Chicken hi stone H2A. F gene 30 
4.2 

Alignments 

aJiE#2 003-3072107 



0 



M2 002-171851 



^-''J: 51/ 



>gi|10863872|ref |NM_000660.1| Homo sapiens transforming growth factor, b 
eta 1 

(Camurati-Engelmann disease) (TGFBl), niRNA 
Length = 2745 

Score = 32.2 bits (16). Expect =1.1 
Identities = 16/16 (100%) 
Strand = Plus / Plus 

Query: 1 acctccctccgc^ag 16 



Sbjct: 1 acctccctccgcggag 16 

>gi I18590091|ref IXM_085882. II Homo sapiens similar to transforming grow 
th factor, beta 1 (H. 

sapiens) (IjOC147760), mRNA 

Length = 697 

Score = 32.2 bits (16), Expect =1.1 
Identities = 16/16 (100%) 
Strand = Plus / Plus 

Query: 1 acctccctccgc^ag 16 



Sbjct: 7 acctccctccgcggag 22 

>gi 1 11424057 1 ref IXM_008912.'l I Homo salens transforming growth factor, 
beta 1 (TGPBl), mRNA 

Length = 2741 



lllllllillllllll 



lllllllllllllill 
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Score = 32.2 bits (16), Expect = 1.1 
Identities = 16/16 (100%) 



Strand 



Plus / Plus 



Query: 1 



acctccctccgcggag 16 



llllllllilllllll 



Sbjct: 1 acctccctccgcggag 16 

Database: All GenBank+EMBLfDDBJ+PDB sequences (but no EST, STS, GSS, 
or phase 0, 1 or 2 HTGS sequences) 

Posted date: Apr 9, 2002 10:59 AM 
Number of letters in database: 1,002,800,820 
Number of sequences in database: 1,205,903 
Lambda K H 

1.37 0.711 1.31 
Gapped 

Lambda K H 

1.37 0.711 1.31 
Matrix: blastn matrix: 1 -3 
Gap Penalties: Existence: 5, Extension: 2 
Number of Hits to DB: 6901 
Number of Sequences: 1205903 
Number of extensions: 6901 
Number of successful extensions: 1479 
Number of sequences better than 10.0: 16 
length of query: 16 
length of database: 5,297,768,116 
effective HSP length: 15 
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effective length of query: 1 

effective length of database: 5,279,679,571 

effective search space: 5279679571 

effective search space used: 5279679571 

T: 0 

A: 30 

XI: 6 (11.9 bits) 
12: 15 (29.7 bits) 
SI: 12 (24.3 bits) 
S2: 15 (30.2 bits) 

Top of Form 

1: MM.(X)0660. Hano scpiesisBeLated Sequences, OMM IMxsn, PublVfed, 
trsn...|igi:10663872| TasoaaniiQ; IMSIS^ liiMM 

LOCUS NM_000660 2745 bp mRNA linear PRI 13-F 
EB-2002 

DEFINITION Homo sapiens transforming growth factor, beta 1 (Camurati-En 
gelmann 



ACCESSION NM_000660 

VERSION NM_000660.1 01:10863872 

KEYWORDS 

SOURCE human. 

ORGANISM Homo sapiens 



Eukaryota; Metazoa: Chordata; Craniata; Vertebrata; Euteleos 



Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 



disease) (TGFBl), mRNA. 



tomi; 



REFERENCE 



1 (bases 1 to 2745) 
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AUTHORS 

TITLE 
ence 

JOURNAL 
MEDLINE 

REFERENCE 
AUTHORS 
TITLE 

mical 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 

TITLE 

sis 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 



Derynck,R.. Jarrett, J.A. , Chen.E.Y., Eaton, D.H., Bell.J.R., 
Assoian,R.K. , Roberts, A. B. , Spom.M.B. and Goeddel.D.V. 
Human transforming growtli factor-beta complementary DNA sequ 

and expression in normal and transformed cells 

Nature 316 (6030), 701-705 (1985) 

85296301 

2 (bases 1 to 2745) 

Spom,M.B. , Roberts, A. B., Wakefield, L.M. and Assoian,R.K. 
Transforming growth factor-beta: biological function and die 

structure 

Science 233 (4763), 532-534 (1986) 

86261803 

3487831 

3 (bases 1 to 2745) 

Chang, N.S., Mattison,J., Cao,H., Pratt, N., Zhao,Y. and Lee, C 

Cloning and characterization of a novel transforming growth 
factor-betal- induced TIAFl protein that inhibits tumor necro 

factor cytotoxicity 

Biochem. Biophys. Res. Commun. 253 (3), 743-749 (1998) 

99119079 

9918798 

4 (bases 1 to 2745) 

Ghadami , M. , Maki ta, Y. , Yoshida, K. , Nishimura, G. , Fukushima, Y 
Wakui,K. , Ikegawa,S., Yamada,K., Kondo.S., Niikawa.N. and To 
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mita. H. 
TITLE 

JOURNAL 
MEDLINE 
PUBMED 

REFERENCE 
AUTHORS 

TITLE 

to 

JOURNAL 
MEDLINE 
PUBMED 

REFERENCE 
AUTHORS 
TITLE 

growth 

JOURNAL 
MEDLINE 
PUBMED 

COMMENT 

final 

1. 

FEATURES 



Genetic mapping of the Camurati-Engelmann disease locus to 

chromosome 19ql3. l-ql3. 3 

Am. J. Hum. Genet. 66 (1), 143-147 (2000) 

20100617 

10631145 

5 (bases 1 to 2745) 

Vaughn, S. P., Broussard, S. , Hall.C.R. , Scott, A., Blanton,S.H. 
Mi lunsky, J . M. and Hecht , J . T. 

Confirmation of the mapping of the Camurati-Englemann locus 

19ql3. 2 and refinanent to a 3.2-cM region 
Genomics 66 (l), 119-121 (2000) 
20304762 
10843814 

6 (bases 1 to 2745) 

Lim,J.M., Kim,J.A., Lee,J.H. and Joo, C.K. 

Dovnregulated expression of integrin alpha6 by transforming 

factor-beta (1) on lens epithelial cells in vitro 
Biochem. Biophys. Res. Commun. 284 (1), 33-41 (2001) 
21268957 
11374867 

PROVISIONAL REFSEQ: This record has not yet been subject to 
NCBI review. The reference sequence was derived from X02812. 
Locat ion/Qual i f iers 
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source 
gene 

misc_feature 
variation 

variation 

CDS 

al 

e)" 



1..2745 

/organism="Homo sapiens" 
/db_xref =" taxon : 9606" 
/chromosome=" 19" 
/map="19ql3. 1" 
1..2745 
/gene="TGFBl" 
/note="TGFB; DPDl; CED" 
/db_xref ="LocusID : 7040" 
/db_xref="MIM: 190180" 
37. . 113 

/note="pot. hairpin loops-forming region" 
72 

/allele^'-" 
/allele="C" 

/clb_xref="dbSNP: 1800999" 
79 

/allele="-" 
/allele="C" 

/db.xref ="dbSNP : 1799753" 
842.. 2017 
/gene="TGFBl" 

/note=" transforming growth factor, beta 1; diaphyse 

dysplasia 1, progressive (Camurati-Engelmann diseas 

/codoa_start=l 
/db_^ref ="LocusID : 7040" 
/db_;cref="MIM: 190180" 
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/product='' transforming growth factor, beta 1 
(Camurati-Engelmann disease)" 
/protein_id="NP_000651. 1" 
/db_xref="GI: 10863873" 

/translat ion="lffPS(mJJ'llLPLLWLLVLTPGPPAAGLSTCKTID 

MELVKRK 

RIEAIRGQILSKU^USPPSQGEWPGPLPEAVLALYNSTRDRVAGESAEFEPEPEAD 
YYAKEVTRVIitfVETlMra)KFKQSTHSIWFNTSELR^ 
UOJCVEQHVELYQKYSNNSTOYI^NRIIAPSDSPEWI^FDTOVVRQ^ 
RI^AHCSCDSRDNTLQVDINGFmRRGDLATffl(M<RPFLIJM^ 
HRRAII)TNYCFSSTEKNCCWQLYIDFRKDIX;MnHEPKGYHAMiX:ii}PCPYI^^ 
TPYSKVyONQHNPGASAAPCCVPQAI^IPIVYYVGRKPKVEQI^ 
misc_feature 863. .910 

/note="pot. core sequence of signal peptide (aa -27 



2 to 



-257)" 
variation 870 

/allele="C" 
/allele=''r 

/db_xref=''dbSNP: 1982073" 

variation 915 

/alleIe="C" 
/allele="G'' 

/db.xref ="dbSNP: 1800471" 
misc.feature 938. . 1600 

/note="TGFb_propeptide; Region: TGF-beta propeptide 



inisc_feature 953 

/note="pot. altem. translation start site" 
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misc_feature 1035.. 1043 . 

/note=''put. glycosylation site" 
misc_feature 1247. . 1255 

/note="put. glycosylation site" 
misc_f eature 1370. . 1378 

/note="put. glycosylation site" 
variation 1632 

/allele="C" 

/allele="r 

/db_xref="dbSNP: 1800472" 
mat_pept ide 1679. . 2014 

/product="mature TGF-beta (aa 1-112)" 
misc_f eature 1715. .2014 

/note="TGF-beta; Region: Transforming growth factor 



beta 

like domain" 
misc_f eature 1721.. 2014 

/note="TGFB; Region: Transforming growth factor-bet 

a 

(TGF-beta) family" 
misc_f eature 2018.. 2096 

/note="GC-rich region" 
promoter 2097. .2103 

/note="TATA-box-like region" 
misc_f eature 2517. . 2522 

/note="put. polyadenylation signal" 

polyJLsite 2539 

/note="polyadenylation site" 

BASE COUNT 527 a 938 c 801 g 479 t 
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ORIGIN 

1 acctccctcc gcggagcagc cagacagcga gggccccggc cgggggcagg g^gacg 



ccc 



egg 



gcc 



aga 



cct 



ttt 



cat 



cct 



ate 



ttc 



ctg 



cag 



ccc 



61 egtccggggc accccccccg gctctgagcc gcccgcgggg ccggcctcgg cceggag 
121 aggaaggagt cgccgaggag cagcctgagg ccecagagtc tgagacgagc cgccgcc 
181 cecgccactg cggggaggag ggggaggagg agcgggagga gggacgagct ggtcggg 
241 agaggaaaaa aacttttgag acttttcegt tgcegctggg agceggaggc gcgggga 
301 cttggcgcga cgctgccccg egaggaggca ggacttgggg accccagacc gectccc 
361 gccgccgggg aegcttgcte cctecctgcc ccctacacgg cgteectcag gcgcccc 
421 tccggaccag ccctcgggag tcgccgaccc ggcctcccgc aaagactttt ccccaga 
481 cgggcgcacc ccctgcacgc cgccttcatc cccggcctgt ctcctgagce cccgcgc 
541 ctagaccctt tctcctccag gagacggate tctctccgac ctgeeacaga tccccta 
601 aagaccaccc accttctggt accagatcgc gcccatctag gttatttccg tgggata 
661 agacacccce ggtccaagcc teccctccac cactgcgccc ttctccctga ggagcet 
721 ctttccctcg aggecctcct accttttgec gggagacccc cagcccctgc aggggcg 
781 cetccccace acaccagccc tgttcgcgct ctcggcagtg cc^ggggcg ccgcctc 
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841 catgccgccc tccgggctgc ^ctgctgcc gctgctgcta ccgctgctgt ggctact 

ggt 

901 gctgacgcct ggcccgccgg ccgcgggact atccacctgc aagactatcg acatgga 

get 

961 ggtgaagcgg aagcgcatcg aggccatccg cggccagatc ctgtccaagc tgc^ct 

cgc 

1021 cagccccccg agccagg^ aggtgccgcc cggcccgctg cccgaggccg tgctcgc 

cct 

1081 gtacaacagc acccgcgacc ^t^ccgg ggagagtgca gaaccggagc ccgagcc 

tga 

1141 ggccgactac tacgccaagg aggtcacccg cgtgctaatg gtggaaaccc acaacga 

aat 

1201 ctatgacaag ttcaagcaga gtacacacag catatatatg ttcttcaaca catcaga 

get 

1261 ccgagaagcg gtacctgaac ccgtgttgct ctcccgggca gagctgcgtc tgctgag 

gag 

1321 gctcaagtta aaagtggagc agcacgtgga gctgtaccag aaatacagca acaattc 

ctg 

1381 gcgatacctc agcaaccggc tgctggcacc cagcgactcg ccagagtggt tatcttt 

tga 

1441 tgtcacc^ gttgtgcggc agt^ttgag ccgtgga^ gaaattgagg gctttcg 

cct 

1501 tagcgcccac tgctcctgtg acagcaggga taacacactg caagtggaca tcaacgg 

gtt 

1561 cactaccggc cgccga^tg acctggccac cattcat^c atgaaccggc ctttcct 

get 

1621 tcteatggcc acceegetgg aga^gccea geatctgeaa ageteccggc accgceg 

age 

1681 ectggacacc aactattgct tcagctceac ggagaagaac tgctgegtgc ggcagct 
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gta 

1741 cattgacttc cgcaaggacc tcggctggaa gtggatccac gagcccaagg gctacca 

tgc 

1801 caacttctgc ctcg^ccct gcccctacat ttggagcctg gacacgcagt acagcaa 

ggt 

1861 cctggccctg tacaaccagc ataacccg^ cgcctcggcg gcgccgtgct gcgtgcc 

gca 

1921 ggcgctggag ccgctgccca tcgtgtacta cgtgggccgc aagcccaagg tggagca 

get 

1981 gtccaacatg atcgtgcgct cctgcaagtg cagctgaggt cccgccccgc cccgccc 

cgc 

2041 cccggcaggc ccggccccac cccgccccgc ccccgctgcc ttgcccatgg g^ctgt 

att 

2101 taaggacacc gtgccccaag cccacctggg gccccattaa agatggagag aggactg 

egg 

2161 atctctgtgt cattgggcge etgcctgggg tetccatccc tgacgttcec ceactcc 

cac 

2221 tccctctctc tccctctctg cctcctcctg cctgtctgca ctattccttt geccggc 

ate 

2281 aaggcaea^ ggaccagtgg ^acactae tgtagttaga tetatttatt gagcacc 

ttg 

2341 ggcactgttg aagtgcctta cattaatgaa eteatteagt caccatagca acactct 

gag 

2401 atggcaggga etctgataac acccatttta aaggttga^ aaacaagece agagagg 

tta 

2461 aggga^agt tcctgeccac caggaacctg ctttagtggg ggatagtgaa gaagaca 

ata 

2521 aaagatagta gttcaggcea ^cgg^tgc tcacgcctgt aatcctagea ctttt^ 

gag 

miE# 2003-3072107 
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2581 gcagagat^ gaggatactt gaatcca^c atttgagacc agcctg^ta acatagt 

gag 

2641 accctatctc tacaaaacac ttttaaaaaa tgtacacctg tggtcccagc tactctg 

gag 

2701 gctaa^tgg gaggatcact tgatcctggg a^tcaa^c tgcag 

// 

Bottom of Form 

Revised: October 24, 2001. 

Query= (16 letters) 
Database: GenBank Human EST entries 
4,280.058 sequences; 2,114,234,064 total letters 
Score E 

Sequences producing significant alignments: (bits) 
Value 

gi|19365764|gb|BM915385.1|BM915385 AGENCOURT_6701642 NIOG. . . 32 
0.41 

gi|19353768lgb|BM903897.1|BM903897 AGENC0URT_6696012. NIH_MG. . . 32 
0.41 

gi|18807810lgbiBM562052.1|BM562052 AGENCOURT_6562015 NIHJIG. . . 32 
0.41 

gi 1 18791603 1 gb I BM553137. 1 1 BM553137 AGENC013RT_6572574 NIH_MG. . . 32 
0.41 

gi 1 16171065 1 gblBI908151.1|BI908151 603067456F1 NIH_MGC_118 . . . 32 
0.41 

gi 1 15759271 1 gb IBI767693. 1 1 BI767693 603060648F1 NIH_MGC_122 ... 32 
0.41 

gi 1 15343643 1 gb IBI518851. 1 IBI518851 603061760F1 NIH_MGC_118 ... 32 
0.41 
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gi 1 14309343 1 gb I BG899094. 1 1 BG899094 H0A21-1-G9 HOA (Human Os. . . 32 
0.41 

gi|13662542|gb|BG611171.1|BG611171 602612144F1 NIHJ1GC_60 H. . . 32 
0.41 

gi 1 12609210 1 gb IBG115704. 1 IBG115704 602317174F1 NIHJ1GC_88 H. . . 32 
0.41 

gi 1 12101282 1 gb IBF796228. 1 IBF796228 602258513F1 NIHJ!GC_85 H. . . 32 
0.41 

gi 1 11152079 Igb 1BF238160. 1 IBF238160 601811886F1 NIHJJGC_48 H. . . 32 
0.41 

gi 1 11100313 1 gb IBF206727. 1 1BF206727 601871105F1 NIHJ1GC_19 H. . . 32 
0.41 

gi 1 11100272 1 gb IBF206686. 1 IBF206686 601871051F1 NIILMGC_19 H. . . 32 
0.41 

gi 1 16775383 1 gb IBM046103. 1 IBM046103 603625849F1 NIHJiGC_40 H. . . 30 
1.6 

gi 1 19739174 1 gblBQ014273.1|BQ014273 UI-H-EDl-axs-h-21-O-UI. s. . . 28 
6.4 

gi 1 19378603 1 gb I BM928224. 1 1 BM928224 AGENC0URT_6699855 NIILMG. . . 28 
6.4 

gi 1 19367808 Igb IBM917429. 1 IBM917429 AGENC0URT_6606724 NIH_MG. . . 28 
6.4 

gi 1 19364214 Igb I BM913835. 1 IBM913835 AGENC0URT_66 12786 NIILMG. - . 28 
6.4 

gi 1 19361343 lgb|BM910964. 1 IBM910964 A(2JIC0URT_6615957 NIILMG. . - 28 
6.4 

gi 1 18505954 1 gb I BM456914. 1 1 BM456914 AGENC0URT_6404253 NIHJJG. . . 28 
6.4 

gi 1 18499709 Igb IBM450669. 1 IBM450669 AGENC0URT_6394717 NULMG. . . 28 
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6.4 

gi 1 16000196 1 gb IBI859449. 1 IBI859449 
6.4 

gi 115928460 Igb I BI818193. 1 IBI818193 
6.4 

gi 1 15431547 1 gb IBI544235. 1 IBI544235 
6.4 

gi 1 15345229 1 gb IBI520437. 1 IBI520437 
6.4 

gi 1 14440373 Igb I BI033747. 1 IBI033747 
6.4 

gi 1 14426676 Igb I BI020046. 1 IBI020046 
6.4 

gi 1 14081325 1 gb I BG770672. 1 1 BG770672 
6.4 

gi i 13546630 1 gb I BG547965. 1 IBG547965 
6.4 

gi 1 13030375 Igb I BG281450. 1 IBG281450 
6.4 

gi 1 12951460 1 emb I AL582959. 1 1 AL582959 
6.4 

gi 1 12764352 Igb I BG254536. 1 IBG254536 
6.4 

gi 1 12378592 Igb IBF961317. 1IBF961317 
6.4 

gi |12374538lgb|BF957263. 1 1BF957263 
6.4 

gi 1 12323114 Igb I BF926150. 1 IBF926150 
6.4 



17 18 5 1 




603388188F1 NIH_MGC_87 H. . . 28 

603032663F1 NIH_MGC_115 ... 28 

603241605F1 NIHJJGC_95 H. . . 28 

603071622F1 NIHJ!GG_119 ... 28 

PM3-NN0223-220201-014-h0. . . 28 

CM3-lffr0291-110101-622-fO. . . 28 

602734012F1 NIH_MGC_49 H. . . 28 

602576071F1 NIIUiGC_77 H. . . 28 

602401966F1 NIH_MGC_20 H. . . 28 

AL582959 LTI_NFL010_BC2. . . 28 

602368464F1 NIHJ1GC_91 H. . . 28 

PM3-NN0223-111200-004-d0. . . 28 

PM3-NN0223-241100-002-b0. . . 28 

CM2-NT0193-301100-562-al ... 28 
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gi 1 12259862 1 gb IBF869732. 1 IBF869732 IL3-ET0114-251000-316-A1. . . 28 
6.4 

gi 1 12129894 1 gb IBF800905. 1 IBF800905 PMl-CI0110-201000-003-f 0. . . 28 
6.4 

gi 1 12071436 1 gb I BF744760. 1 IBF744760 QV2-BT0635-311000-440-cl. . . 28 
6.4 

gi|11770407|gb|BE965733.2|BE965733 601659792R1 NIILMGC_70 H. . . 28 
6.4 

gi 1 11766539 1 gb IBE963121. 2 IBE963121 601656923R1 NIH_MGC_67 H. . . 28 
6.4 

gi 1 10348536 1 gb I BE890328. 1 1 BE890328 601431783F1 NnLMGC_72 H. . . 28 
6.4 

gi 1 10142985 1 gb I BE728993. 1 1 BE728993 601562251F1 NIIUIGC_20 H. . . 28 
6.4 

gi 1 10095527 1 gb I BE707262. 1 1 BE707262 PMl-Hr0452-060700-008-e0. . . 28 
6.4 

gi 1 9772196 1 gb IBE543551. 1 1BE543551 601070523F1 NIOGC_12 Ho. . . 28 
6.4 

gi 1 9768571 1 gb I BE539926. 1 1 BE539926 601060667F2 NIHJGC.IO Ho. . . 28 
6.4 

gi 1 9342607 1 gb I BE397242. 1 1 BE397242 601290754F1 Nim4GC_8 Horn. . . 28 
6.4 

gi 1 9332870 1 gb IBE387505. 1 IBE387505 601274247F1 NIHJ1GC_20 Ho. . . 28 
6.4 

gi 1 8140649 1 gb 1 Af950985. 1 1 AW950985 EST363055 MAGE resequence. . . 28 
6.4 

gi 1 8139665 1 gb I AW950129. 1 1 AW950129 EST362094 MAGE resequence. . . 28 
6.4 

gi 1 6879658 1 gb I AW375004. 1 1 AW375004 MRO-CT0068-280999-002-f 07, . . 28 
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6.4 

gi|5435227Ienib|AL079651.1IAL079651 DKFZp434N0629_rl 434 (sy. . . 28 
6.4 

gi|5406349lemb|AL036861.11AL036861 DKFZp56401963_rl 564 (sy... 28 
6.4 

gi|2566893lgb|AA641675.1|AA641675 iir62g01. si NCI_CGAP_Lym3 . . . 28 
6.4 

gi 1 2080087 1 gb I AA418268. 1 1 AA418268 zv96d09. si Soares_NhHMPu_. . . 28 
6.4 

gi 1 2056455 1 gb I AA402650. 1 IAA402650 zu49g06. rl Scares ovary t. . . 28 
6.4 

gi|1516398lgb|AA040102.1|AA040102 zk46e02.rl Soares_pregnan. . . 28 
6.4 

Al ignments 

>gi 1 19365764 1 gb I BM915385. 1 1 BM915385 AGENC0URT_6701642 NIH_MGC_41 Homo s 
apiens cDNA clone 

IMAGE: 5481560 5'. 

Length = 1086 

Score = 32.2 bits (16), Expect = 0.41 
Identities = 16/16 (100%) 
Strand = Plus / Plus 

Query: 1 acctccctccgc^ag 16 

llllllllllllllll 
Sbjct: 23 acctccctccgcggag 38 

>gi 1 19353768 1 gb 1BM903897. 1 IBM903897 AGENC0URT_6696012 NIH_MGC_67 Homo s 
apiens cDNA clone IMAGE: 5492392 
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Length = 1497 

Score = 32.2 bits (16), Expect = 0.41 
Identities = 16/16 (100%) 
Strand = Plus / Minus 

Query: 1 acctccctccgcggag 16 

llllllllilllllll 
Sbjct: 445 acctccctccgcggag 430 

>gi|18807810lgblBM562052.1|BM562052 AGENC0URr_6562015 Nm_MGC_118 Homo 
sapiens cDNA clone 

IMAtS 15745414 5'. 

Length =1175 

Score = 32.2 bits (16), Expect = 0.41 
Identities = 16/16 (100%) 
Strand = Plus / Plus 

Query: 1 acctccctccgcggag 16 

llllllllilllllll 
Sbjct: 20 acctccctccgcggag 35 

>gi 118791603 1 gb I BM553137. 1 1 BM553137 AGENC0URT_6572574 NIILM(X:_41 Homo s 
apiens cDNA clone 

IMAGE:5467063 5' . 

Length = 1100 




miE#2 003-3072107 



02002-171851 A y^-i^ : 68/ 



Score = 32.2 bits (16), Expect = 0.41 
Identities = 16/16 (100%) 
Strand = Plus / Plus 

Query: 1 acctccctccgcggag 16 

lllillllllllllll 
Sbjct: 26 acctccctccgcggag 41 

>gi|16171065lgb|BI908151.1|BI908151 603067456F1 NIH_MGC_118 Homo sapien 
s cDNA clone IMAGE:5216508 5*. 
Length = 706 

Score = 32.2 bits (16), Ejcpect = 0.41 
Identities = 16/16 (100%) 
Strand = Plus / Plus 

Query: 1 acctccctccgc^ag 16 

lllillllllllllll 
Sbjct: 25 acctccctccgc^ag 40 

>gill5759271lgb|BI767693.1IBI767693 603060648F1 NIHJIGC_122 Homo sapien 
s cDNA clone IMAGE:5209978 5'. 
Length = 862 

Score = 32.2 bits (16). Expect = 0.41 
Identities = 16/16 (100%) 
Strand = Plus / Plus 

Query: 1 acctccctccgcggag 16 
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llllllllllllllll 
Sbjct: 705 acctccctccgcgg2« 720 

>gi|15343643lgblBI518851.1|BI518851 603061760F1 NIILMGC_118 Homo sapien 
s cDM clone IMAGE: 5210943 5'. 
Length = 943 

Score = 32.2 bits (16), Expect = 0.41 
Identities = 16/16 (100%) 
Strand = Plus / Plus 

Query: 1 acctccctccgc^ag 16 

llllllllllllllll 
Sbjct: 25 acctccctccgcggag 40 

>gi|14309343lgb|BG899094.11BG899094 H0A21-1-G9 HOA (Human Osteoarthr it i 
c Cartilage) Homo sapiens 
cDNA. 

Length = 364 

Score = 32.2 bits (16), Expect = 0.41 
Identities = 16/16 (100%) 
Strand = Plus / Plus 

Query: 1 acctccctccgcggag 16 

llllllllllllllll 
Sbjct: 83 acctccctccgc^ag 98 

>gi 1 13662542 1 gb IBG611171. 1 IBG611171 602612144F1 NIILMGC_60 Homo sapiens 



aiiE# 2003-3072107 



=M 2002-171851 ^-v? : 70/ 



cDNA clone IMAGE :4737466 5'. 
Length = 897 

Score = 32.2 bits (16), Expect = 0.41 
Identities = 16/16 (100%) 
Strand = Plus / Minus 

Query: 1 acctccctccgcggag 16 

llllllllllllllll 
Sbjct: 809 acctccctccgcggag 794 

>gi 1 12609210 1 gb I BG115704. 1 1 BG115704 602317174F1 NIILMGC_88 Homo sap iens 
cDNA clone IMAGE:4417482 5'. 
Length = 838 

Score = 32.2 bits (16), Expect = 0.41 
Identities = 16/16 (100%) 
Strand = Plus / Plus 

Query: 1 acctccctccgc^ag 16 

llllllllllllllll 
Sbjct: 51 acctccctccgcggag 66 

>gi 1 12101282 1 gb I BF796228. 1 1 BF796228 602258513F1 NnLMGC_85 Homo sap iens 
cDNA clone IMAGE:4341962 5' . 
Length = 1081 

Score = 32.2 bits (16), Expect = 0.41 
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Identities = 16/16 (100%) 
Strand = Plus / Plus 

Query: 1 acctccctccgcggag 16 

llllilllllilllll 
Sbjct: 7 acctccctccgcggag 22 

>gi 1 11152079 1 gblBF238160.1IBF238160 601811886F1 NIH_MGC_48 Homo sapiens 
cDNA clone IMAGE: 4054821 5* . 
Length = 811 

Score = 32.2 bits (16), Expect = 0.41 
Identities = 16/16 (100%) 
Strand = Plus / Plus 

Query: 1 acctccctccgcggag 16 

llllilllllilllll 
Sbjct: 11 acctccctccgcggag 26 

>gi 1 11100313 lgb|BF206727.1|BF206727 601871105F1 NIHJiGC_19 Homo sapiens 
cDNA clone IMAGE:4101600 5* . 
Length = 888 

Score = 32.2 bits (16), Expect = 0.41 
Identities = 16/16 (100%) 
Strand = Plus / Plus 

Query: 1 acctccctccgc^ag 16 
llllilllllilllll 
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Sbjct: 32 acctccctccgcggag 47 

>gi 1 11100272 1 gb IBF206686. 1 IBF206686 601871051F1 NIHJ4GC_19 Homo sapiens 
cDNA clone IMAGE:4101517 5'. 
Length = 917 

Score = 32.2 bits (16), Expect = 0.41 
Identities = 16/16 (100%) 
Strand = Plus / Plus 

Query: 1 acctccctccgcggag 16 



Sbjct: 33 acctccctccgc^ag 48 

>gi 1 16775383 1 gb I BM046103. 1 1 BM046103 603625849F1 Nm_MGC_40 Homo sap iens 
cDNA clone IMAGE: 5452309 5'. 
Length = 869 

Score = 30.2 bits (15), Expect =1.6 
Identities = 15/15 (100%) 
Strand = Plus / Plus 

Query: 2 cctccctccgcggag 16 



Sbjct: 692 cctccctccgcggag 706 

>gi 1 19739174 1 gb IBQ014273. 1 IBQ014273 UI-H-EDl-axs-h-21-O-UI. si NCI_CGAP_ 
EDI Homo sapiens cDNA clone 



llllllllllllllll 



llllillllllllll 



IMAGE: 5833028 3*. 
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Length = 772 

Score = 28.2 bits (14), Expect =6.4 
Identities = 14/14 (100%) 
Strand = Plus / Minus 

Query: 2 cctccctccgcgga 15 

illlllllllllll 
Sbjct: 495 cctccctccgcgga 482 

>gi 1 19378603 1 gb I BM928224. 1 1 BM928224 AGEN(X)URT_6699855 NIHJ1GC_121 Homo 
sapiens cDNA clone IMAGE:5770072 5'. 
Length = 1140 

Score = 28.2 bits (14), Expect = 6.4 
Identities = 14/14 (100%) 
Strand = Plus / Plus 

Query: 2 cctccctccgc^a 15 

Illlllllllllll 
Sbjct: 1009 cctccctccgcgga 1022 

>gi 1 19367808 1 gb I BM917429. 1 1 BM917429 AGENC0URT_6606724 NIH_MGC_106 Homo 
sapiens cDNA clone IMAGE: 5483947 5*. 
Length = 1073 

Score = 28.2 bits (14), Expect =6.4 
Identities = 14/14 (100%) 
Strand = Plus / Plus 
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Query: 1 acctccctccgcgg 14 



lillllMIIIIII 



Sbjct: 916 acctccctccgcgg 929 

>gi 1 19364214 1 gb I BM913835. 1 1 BM913835 A(a;NC0URT_6612786 NIOGC_98 Homo s 
apiens cDNA clone IMAGE: 5477539 5'. 
Length = 1104 

Score = 28.2 bits (14), Expect = 6.4 
Identities = 14/14 (100%) 
Strand = Plus / Minus 

Query: 2 cctccctccgcgga 15 



Sbjct: 842 cctccctccgcgga 829 

>gi 1 19361343 1 gb IBM910964. 1 1BM910964 AGENC0URT_6615957 NIH_MGC_98 Homo s 
apiens cDNA clone IMAGE: 5454547 5". 
Length = 1128 

Score = 28.2 bits (14), Expect = 6.4 
Identities = 14/14 (100%) 
Strand = Plus / Minus 

Query: 3 ctccctccgcggag 16 



llllllllllllll 



lllllllillllll 



Sbjct: 883 ctccctccgcggag 870 
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>gi 1 18505954 1 gblBM456914.1|BM456914 AGENC0URT_6404253 NIILMGC_92 Homo s 
apiens cDNA clone 

IMAGE: 5583862 5*. 

Length = 1813 

Score = 28.2 bits (14), Expect = 6.4 
Identities = 14/14 (100%) 
Strand = Plus / Plus 

Query: 2 cctccctccgcgga 15 



Sbjct: 29 cctccctccgcgga 42 

>gi 1 18499709 1 gb IBM450669. 1 1 BM450669 AGENC0URT_6394717 NIH_MGC_67 Homo s 
apiens cDNA clone IMAGE:5494366 5'. 
Length = 1430 

Score = 28.2 bits (14), Expect = 6.4 
Identities = 14/14 (100%) 
Strand = Plus / Plus 

Query: 1 acctccctccgcgg 14 



Sbjct: 1150 acctccctccgc^ 1163 

>gi|16000196lgb|BI859449.1|BI859449 603388188F1 NIH_MGC_87 Homo s^iens 
cDNA clone IMAGE: 5396997 5'. 
Length = 852 



llllllllllllll 



lllillllilllll 
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Score = 28.2 bits (14), Expect = 6.4 
Identities = 14/14 (lOOX) 
Strand = Plus / Plus 

Query: 1 acctccctccgcgg 14 

llllllllllllll 
Sbjct: 100 acctccctccgcgg 113 

>gi 1 15928460 Igb IBI818193. 1 IBI818193 603032663F1 NIHMGC_115 Homo sapien 
s cDM clone IMAGE: 5173838 5'. 
Length = 683 

Score = 28.2 bits (14), Expect = 6.4 
Identities = 14/14 (100%) 
Strand = Plus / Plus 

Query: 2 cctccctccgc^a 15 

llllllllllllll 
Sbjct: 96 cctccctccgcgga 109 

>gi 1 15431547 1 gb I BI544235. 1 1 BI544235 603241605F1 NIH_M{X:_95 Homo sap iens 
cDNA clone IMAGE: 5284296 5*. 
Length = 676 

Score = 28.2 bits (14), Expect = 6.4 
Identities = 14/14 (100%) 
Strand = Plus / Minus 

Query: 3 ctccctccgcggag 16 
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lllllllillilll 
Sbjct: 39 ctccctccgc^ag 26 

>gi 1 15345229 1 gb I BI520437. 1 1 BI520437 603071622F1 NIH_MGC_119 Homo sapien 
s cDNA clone IMAGE: 5163773 5'. 
Length = 727 

Score = 28.2 bits (14), Expect = 6.4 
Identities = 14/14 (100%) 
Strand = Plus / Minus 

Query: 1 acctccctccgcgg 14 

lllllilllillll 
Sbjct: 505 acctccctccgcgg 492 

>gi 1 14440373 1 gb I BI033747. 1 1 BI033747 PM3-NN0223-220201-014-h04 NN0223 Ho 
mo sapiens cDNA. 

Length = 284 

Score = 28.2 bits (14), Expect =6.4 
Identities = 14/14 (100%) 
Strand = Plus / Minus 

Query: 1 acctccctccgc^ 14 

lllllilllillll 
Sbjct: 97 acctccctccgcgg 84 

>gi 1 14426676 1 gb 18X020046. 1 131020046 CM3-lfr0291-110101-622-f 04 MTX)291 Ho 
mo sapiens cDNA. 
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Length = 436 

Score = 28.2 bits (14), Expect = 6.4 
Identities = 14/14 (100%) 
Strand = Plus / Minus 

Query: 1 acctccctccgcgg 14 

llllllllllllll 
Sbjct: 365 acctccctccgcgg 352 

>gi 1 14081325 1 gb IBG770672. 1 IBG770672 602734012F1 NIHJ1GC_49 Homo sapiens 
cDNA clone IMAGE: 4859546 5'. 
Length = 949 

Score = 28.2 bits (14), Expect =6.4 
Identities = 14/14 (100%) 
Strand = Plus / Plus 

Query: 1 acctccctccgcgg 14 

llllllllllllli 
Sbjct: 63 acctccctccgc^ 76 

>gi 1 13546630 Igb IBG547965. 1 IBG547965 602576071F1 NIIU!GC_77 Homo sapiens 
cDNA clone IMAGE:4704209 5*. 
Length = 918 

Score = 28.2 bits (14), Expect = 6.4 
Identities = 14/14 (100%) 
Strand = Plus / Plus 
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Query: 1 acctccctccgcgg 14 



llllllllllllli 



Sbjct: 248 acctccctccgcgg 261 

>gi 1 13030375 1 gb I BG281450. 1 1 BG281450 602401966F1 NIH_MGC_20 Homo sap iens 
cDNA clone IMAGE: 4544201 5' . 
Length = 782 

Score = 28.2 bits (14). Expect = 6.4 
Identities = 14/14 (100%) 
Strand = Plus / Plus 

Query: 1 acctccctccgcgg 14 



Sbjct: 417 acctccctccgc^ 430 

>gi 1 12951460 1 emb I AL582959. 1 1 AL582959 AL582959 LTI_NFL010_BC2 . Homo sap ie 
ns cDNA clone CSODL008YA12 3 prime. 
Length » 822 

Score = 28.2 bits (14), Expect = 6.4 
Identities = 14/14 (100%) 
Strand = Plus / Minus 



llllllllllllli 



Query: 2 cctccctccgc^a 15 



llllllilllllll 



Sbjct: 533 cctccctccgcgga 520 
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>gi 1 12764352 |gb|BG254536.1|BG254536 602368464F1 NIHJ1GC_91 Homo sapien 
s cDNA clone IMAGE: 4476902 5*. 
Length = 1031 

Score = 28.2 bits (14), Expect = 6.4 
Identities = 14/14 (100%) 
Strand = Plus / Plus 

Query: 2 cctccctccgcgga 15 

illlllllllllll 
Sbjct: 849 cctccctccgcgga 862 

>gi 1 12378592 1 gb IBF961317. 1 1 BF961317 PM3-NN0223-111200-004-d03 NN0223 H 
omo sapiens cDNA. 

Length - 277 

Score = 28.2 bits (14), Expect = 6.4 
Identities = 14/14 (100%) 
Strand = Plus / Minus 

Query: 1 acctccctccgc^ 14 

illlllllllilll 
Sbjct: 89 acctccctccgc^ 76 

>gi 1 12374538 1 gb I BF957263. 1 1 BF957263 PM3-NN0223-241 100-002-b08 NN0223 H 
omo sapiens cDNA. 

Length = 168 

Score = 28.2 bits (14), Expect = 6.4 
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Identities = 14/14 (100%) 



Strand = Plus / Minus 



Query: 1 acctccctccgcgg 14 



lllllillllllll 



Sbjct: 117 acctccctccgcgg 104 

>gi 1 12323114 1 gb I BF926150. 1 IBF926150 CM2-NT0193-301100-562-al2 NT0193 Ho 
mo sapiens cDNA. 

Length = 417 

Score = 28.2 bits (14), Expect = 6.4 
Identities = 14/14 (100%) 
Strand = Plus / Minus 

Query: 2 cctccctccgc^a 15 



Sbjct: 268 cctccctccgcgga 255 

>gi 1 12259862 1 gb I BF869732. 1 IBF869732 IL3-ET0114-251000-316-A11 ET0114 Ho 
mo sapiens cDNA. 

Length = 278 

Score = 28.2 bits (14), Expect = 6.4 
Identities = 14/14 (100%) 
Strand = Plus / Minus 

Query: 1 acctccctccgcgg 14 



lllllillllllll 



lllllillllllll 
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Sbjct: 73 acctccctccgcgg 60 

>gi 1 12129894 1 gb I BF800905. 1 1 BF800905 PMl-CI0110-201000-003-f 08 CIOllO Ho 
mo sapiens cDNA. 

Length = 283 

Score = 28.2 bits (14), Expect = 6.4 
Identities = 14/14 (100%) 
Strand = Plus / Plus 

Query: 1 acctccctccgcgg 14 

llllllllilllll 
Sbjct: 211 acctccctccgcgg 224 

>gi|12071436lgb|BF744760.1|BF744760 QV2-BT0635-311000-440-cll BT0635 Ho 
mo sapiens cDNA. 

Length = 534 

Score = 28.2 bits (14), Expect =6.4 
Identities = 14/14 (100%) 
Strand = Plus / Plus 

Query: 2 cctccctccgcgga 15 

llllllllilllll 
Sbjct: 319 cctccctccgc^a 332 

>gi 1 11770407 1 gb IBE965733. 2 1 BE965733 601659792R1 NIH_MGC_70 Homo sapiens 
cDNA clone JMACS;: 3896134 3*. 
Length = 1336 
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Score = 28.2 bits (14), Expect =6.4 
Identities = 14/14 (100%) 
Strand = Plus / Plus 

Query: 1 acctccctccgcgg 14 

llllllllllllll 
Sbjct: 292 acctccctccgcgg 305 

>gi 1 11766539 1 gb IBE963121. 2 IBE963121 601656923R1 NIH_MGC_67 Homo sapiens 
cDNA clone IMAGE: 3865924 3* . 
Length = 1442 

Score = 28.2 bits (14), Expect =6.4 
Identities = 14/14 (100%) 
Strand = Plus / Minus 

Query: 3 ctccctccgcggag 16 

llllllllllllll 
Sbjct: 403 ctccctccgcggag 390 

>gi 1 10348536 1 gb I BE890328. 1 1 BE890328 601431783F1 NIILMGC_72 Homo sap iens 
cDNA clone IMAGE:3916820 5'. 
Length = 794 

Score = 28.2 bits (14), Expect =6.4 
Identities = 14/14 (100%) 
Strand = Plus / Plus 
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Query: 1 acctccctccgc^ 14 
llllllllllllll 



Sbjct: 115 acctccctccgcgg 128 

>gi 1 10142985 lgb|BE728993.1 1 BE728993 601562251F1 NIIUIGC_20 Homo sapiens 
cDNA clone IMAGE: 3831924 5*. 
Length = 840 

Score = 28.2 bits (14), Expect = 6.4 
Identities = 14/14 (100%) 
Strand = Plus / Plus 

Query: 1 acctccctccgcgg 14 



Sbjct: 397 acctccctccgcgg 410 

>gi 1 10095527 1 gb IBE707262. 1 IBE707262 PMl-HT0452-060700-008-e08 HT0452 Ho 
mo sapiens cDNA. 

Length = 592 

Score = 28.2 bits (14), Expect = 6.4 
Identities = 14/14 (100%) 
Strand = Plus / Minus 

Query: 1 acctccctccgcgg 14 



Sbjct: 343 acctccctccgc^ 330 

>gi 19772196 lgblBE543551.1|BE543551 601070523F1 NIH_MGC_12 Homo sapiens 



llllllllllllli 



llilllllllllll 



aiiE#2 003-3072107 



02 002-171851 A -^-v^ : 85/ 



cDNA clone IMAGE:3456940 5' . 
Length = 1035 

Score = 28.2 bits (14), Expect =6.4 
Identities = 14/14 (100%) 
Strand = Plus / Plus 

Query: 1 acctccctccgcgg 14 

liillllllillli 
Sbjct: 332 acctccctccgcgg 345 

>gi 1 9768571 1 gb I BE539926. 1 1 BE539926 601060667F2 NIH_MGC_10 Homo sap iens 
cDNA clone IMAGE:3447161 5'. 
Length = 902 

Score = 28.2 bits (14), Expect =6.4 
Identities = 14/14 (100%) 
Strand = Plus / Plus 



Query: 1 acctccctccgcgg 14 

liillllllillli 
Sbjct: 411 acctccctccgcgg 424 



>gil9342607lgb|BE397242.1|BE397242 601290754F1 NIIUtGC_8 Homo salens c 
DMA clone IMAGE:3621253 5* . 
Length = 524 
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Score = 28.2 bits (14), Expect =6.4 

Identities = 14/14 (100%) 
Strand = Plus / Plus 

Query: 2 cctccctccgcgga 15 

llllllllllllli 
Sbjct: 228 cctccctccgcgga 241 

>gi|9332870lgb|BE387505.1|BE387505 601274247F1 NIH_MGC_20 Homo sapiens 
cDNA clone IMAGE: 3615538 5' . 
Length = 637 

Score = 28.2 bits (14), Expect = 6.4 
Identities = 14/14 (100%) 
Strand = Plus / Plus 

Query: 1 acctccctccgcgg 14 

llllllllllliil 
Sbjct: 422 acctccctccgcgg 435 

>gi 1 8140649 1 gb I AW950985. 1 1 AW950985 EST363055 MAGE resequences, MAGA Horn 
o sapiens cDNA. 

Length = 638 

Score = 28.2 bits (14), Expect =6.4 
Identities = 14/14 (100%) 
Strand = Plus / Plus 
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Query: 2 cctccctccgcgga 15 

llllllllllllll 
Sbjct: 273 cctccctccgc^ 286 

>gi|8139665|gb|AW950129.1|AW950129 EST362094 MAGE resequences, MAGA Homo 
sapiens cDNA. 

Length = 611 

Score = 28.2 bits (14), Expect = 6.4 
Identities = 14/14 (100«) 
Strand = Plus / Plus 

Query: 2 cctccctccgc^a 15 

llllllllllllil 
Sbjct: 273 cctccctccgcgga 286 

Database: GenBank Human EST entries 

Posted date: Mar 29, 2002 2:35 AM 
Number of letters in database: 2,114,234.064 
Number of sequences in database: 4,280,058 
Lambda K H 

1.37 0.711 1.31 
Gapped 

Lambda K H 

1.37 0.711 1.31 
Matrix: blastn matrix: 1 -3 
Gap Penalties: Existence: 5, Extension: 2 
Number of Hits to DB: 5013 
Number of Sequences: 4280058 
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Number of extensions: 5013 

Number of successful extensions: 5013 

Number of sequences better than 10.0: 61 

length of query: 16 

length of database: 2,114,234,064 

effective HSP length: 15 

effective length of query: 1 

effective length of database: 2,050,033,194 

effective search space: 2050033194 

effective search space used: 2050033194 

T: 0 

A: 30 

XI: 6 (11.9 bits) 
12: 15 (29.7 bits) 
Si: 12 (24.3 bits) 
S2: 14 (28.2 bits) 
Top of Form 



1: BM915386. AGENCOURT_6701642...[gi:19365764] 



LinkO 



ut 



IDENTIFIERS 



dbEST Id: 



11598757 



EST name: 



AGENC0URT_6701642 



GenBank Acc: 



BM915385 



GenBank gi: 



19365764 



CLONE INFO 



Clone Id: 



IMAGE: 5481560 (5') 



Plate: 



LLCM2006 Row: d Column: 09 
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DNA type: 
PRIMERS 

PolyA Tail: 
SEQUENCE 

CCCG 

CGGG 

GAGT 

GAGG 

CTGG 

TGGG 

CACG 

CTCG 

CCTG 

TCCG 

GCCT 

TCCC 



cDNA 
Unknown 

CGCCCTGGGCCATCTCCnX^CCAQ^mnmSCGG^ 

GCCGGGGGCAGGGGGGACGCCCCGTCCGGGGCACCCCCCCGGCIXTTGAGCCGCCCG 

GCCGGCCTCGGCCaXlAGCGGAGGAAGGAGTaSCCGAGGAGCAGCCTGAGGCCCCA 

CTGAGACGAGCCGCCGCCGCCCCCGCCACTGCGGGGAGGAGGGGGAGGAGGAGCGG 

AGGGACGAGCTGGTCGGGAGAAGAGGAAAAAAACTTmAGACTTTTC 

GAGCCGGAGGCGCGGGGACCTCTTGGCGCGACGCTGCCCCGCGAGGAGGCAGGACT 

GACaCAGACCG(XnXX(nTreaG(XGGGGACG(nTGCT^^ 

GCGTCCCTCAGGCGCCCCCAmCGGACCAGCXCTCGGGAGTCGCCGACGCGGCCT 

CAAAGACTTTTCACCATAC(n€GGG(XX:AC(XriXnXX:AC^^ 

imcmGCCCCXGCGGATGCCTAGACCCTTTCTa^ 

ACCTGOXX^AAAmCCTATTCnXXJAACACCGCCGC^^ 

TTCGACG(nmTGCGCTGGGGAA(nXJAAGAG(XXX:CGGGTTCCT 

anriTGAAAAACATCCCCCGTTAATAAACCnTGACT^^ 
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CCCT 
CAAG 
ACCG 
CCCG 
CCTC 
GGGG 



TAC(KnTITIXXKXXKX:ACTAMCAMaTCGA(nt^ 

CCTGAATAamGCGC(nTA(XXKX:(XnXTITr 

GAaCTAmATmTTTC(X:GTGACGTGTGC(KXX^^ 

ACACATTGTGATAAAACACCACTTTCGACACGCCCTA(nx:CTGT^ 

CC(XCGTGTAAAATTTCCax:GaAATGaCrCCAmmcm€ 



TCGGCN 

Quality: Hi^ quality sequence stops at base: 467 

Entry Created: Mar 11 2002 
Last Updated: Mar 12 2002 



COMMENTS 



an 



Tissue Procurement: DCTD/DTP 

cDNA Library Preparation: Rubin Laboratory 

cDNA Library Arrayed by: The I.M,A.G.E. Consortium (LLNL 

DNA Sequencing by: Agaicourt Bioscience Corporation 
Clone distribution: MGC clone distribution information c 

be found through the I.M.A.G.E. Consort ium/LlM. at: 
http :// image. 1 Inl . gov 
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Lib Name: 
Organism: 
Organ: 

Tissue type: 
Lab host: 
Vector: 
R. Site 1: 
R. Site 2: 
Description: 

G(G 



ng 
T 

SUBMITTER 

Name: 

E-mail: 

CITATIONS 
Title: 

Authors: 

Year: 

Status: 



NIH.MGC.41 
Homo sapiens 
skin 

amelanotic melanoma, cell line 

DHIOB (phage-resistant) 

pOTB7 

Xhol 

EcoRI 

cDNA made by oligo-dT priming. Directional ly cloned into 
EcoRI/XhoI sites using the following 5' adaptor: GGCACGA 

). Library constructed by Ling Hong in the laboratory of 
Gerald Rubin (University of California, Berkeley) usi 

ZAP-cDNA synthesis kit (Stratagene) and Superscript II R 

(Life Technologies), Note: this is a NIHJJGC Library. 

Robert Strausberg, Ph.D. 
cgapbs-r@mai 1 . nih. gov 

National Institutes of Health, Mammalian Gene Collection 
(MGC) 

NIH-MGC http://mgc.nci.nih.gov/ 
1999 

Unpublished 
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Bottom of Form 
Revised: October 24, 2001. 
Check on Est in Genbank: 
Query= (1086 letters) 

Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, 
GSS, or phase 0, 1 or 2 HTGS sequences) 

1,205,903 sequences: 5,297,768,116 total letters 

Score E 

Sequences producing significant alignments: (bits) 
Value 

gi 1 10863872 1 ref I NJL000660. 1 1 Homo sapiens transforming grow. . . 587 

e-165 

gi|18590091|ref |XM_085882.1| Homo sapiens similar to transf. . . 587 
e-165 

gi|11424057|ref |XM_008912.1| Homo sapiens transforming grow. . . 587 
e-165 

gi 1 7684381 1 gb I AC011462. 4 1 AC011462 Homo sapiens chromosome 1. . . 587 
e-165 

gi l37097lemb|X05839. IIHSTGFBGI Human transforming growth fa. . . 587 
e-165 

gi 1 37092 1 emb I X02812. 1 1 HSTGFBl Human mRNA for transforming g. . . 587 
e-165 

gil340526|gb|J04431.1|HUMrGFBlPR Homo sapiens transforming ... 587 
e-165 

gi 1 12654682 1 gb I BC001180. 1 IBC001180 Homo sapiens, Simi lar to. . . 291 
8e-76 - 
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gi 1 12652748 1 gb IBC000125. 1 IBC000125 Homo s^iens, Simi lar to. . . 291 
8e-76 

gi 1 18490115 1 gb I BC022242. 1 1 Homo sapiens, clone MGC:22008 IM. . . 153 

4e-34 

gi 1 755044 1 gb IM23703. 1 IPIGTGFBIA Sus scrof a transforming gro. . . 129 
6e-27 

gi 1 7650477 1 gb I AF249327. 1 1AF249327 Rattus norvegicus TGF-bet. . . 66 
8e-08 

gi 1 4416081 1 gb I AF105069. 1 1 AF105069 Rattus norvegicus transf o. . . 66 
8e-08 

gi 1 2394170 1 gb I AF015683. 1 1 AF015683 Rattus norvegicus transf o. . . 66 
8e-08 

gi 1 6755774 1 ref INM_011577. 1 1 Mus musculus transforming growt. . . 64 
3e-07 

gi 1 1161133 lgb|L42456. 1 IMUSTGFlGOl . Mus musculus TGF-1 gene, ... 64 
3e-07 

gi 1 3688423 1 emb I AJ009862. 1 1 lilU009862 Mus musculus mRNA f or t . . . 64 
3e-07 

gi 1 201947 1 gb I M57902. 1 1 MUSTGFBl Mouse transforming growth f a. . . 64 
3e-07 

gi 1 18042365 1 ^ I AC097483. 3 1 Homo sapiens BAC clone RP11-146N. . . 44 
0.30 

gi 1 17481821 1 ref I XM_008785. 3 1 Homo sapiens one cut domain, f . . . 44 
0.30 

gi 1 12737997 1 ref |XM_007116. 2 1 Homo s^iens Zic f ami ly member. . . 44 
0.30 

gi 1 6005961 1 ref INM_007129. 1 1 Homo sapiens Zic f ami ly member ... 44 
0.30 

gi 1 11065969 1 gb I AF193855. 1 IAF193855 Homo sapiens zinc finger. . . 44 
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0.30 

gi 147588471 ref I NM_004852. II Homo sapiens one cut domain, fa... 44 
0.30 

gi 1 15787728 1 emb I AL355338. 33 1 AL355338 Human DNA sequence fro. . . 44 
0.30 

gi 1 4028591 1 gb 1 AF104902. 1 1 AF104902 Homo sap iens ZIC2 prote in. . . 44 
0. 30 

gi 1 1531593 1 gb I U50523. 1 1 HSU50523 Human BRCA2 region, mRNA se. . . 44 
0.30 

gi 1 4468940 1 emb I Y18198. 1 1 HSAY18198 Homo sap iens mRNA for ONE. . . 44 
0.30 

gi 1 19067958 1 gb I AY049805. 1 1 Alopias pelagicus 5. 8S r ibosomal ... 42 
1.2 

gi 1 18025465 1 gb I AY037858.il Cercop ithi cine herpesvirus 15 st. . . 42 
1.2 

gi 1 12039248 1 gb I AC020659. 5 1 AC020659 Homo sap iens chromosome ... 42 
1.2 

gi|19909461|gb|AC098709.3l Mus musculus clone RP23-1K14, co. . . 40 
4.6 

gi 1 19921137 1 ref |NM_135651. 1 1 Drosophi la melanogaster (CG47. . . 40 
4.6 

gi|18376846lgb|AC092198.2| Homo sapiens chromosome X clone ... 40 
4.6 

gi|18467841|ref|XM_078995.1I CG4751 (CG4751), mRNA 40 
4.6 

gi|18376869|gb|AC091898.2l Homo sapiens chromosome 5 clone ... 40 
4.6 

gi|18030132lgblAC026695.5l Homo sapiens chromosome 5 clone .. . 40 
4.6 
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gi|I5887302lgb|AC020914.8l Homo sapiens chromosome 19 clone. . . 40 
4.6 

gi 1 14578122 Igb I AC092241. 1 1 AC092241 Drosophi la melanogaster, ... 40 
4.6 

gi 1 15292266 Igb IAY051978.il Drosophi la melanogaster IJ)44770 .. . 40 
4.6 

gi 1 15055218 1 gb I AC060226. 39 1 Homo sapiens 12 BAC RP11-101P14. . . 40 
4.6 

gi 1 14389338 1 gb I AC084282. 6 1 AC084282 Oryza sat i va chromosome ... 40 
4.6 

gi 1 13677167 1 gb I AC015977. 9 1 AC015977 Homo sapiens clone RP11-. . . 40 
4.6 

gi I9910225|ref INM_020179. 1 1 Homo sapiens FN5 protein (FN5), ... 40 
4.6 

gi 1 10440613 1 gb I AC069145. 5 1 AC069145 Oryza sat i va chromosome ... 40 
4.6 

gi|10728714|gb|AE003631.2|AE003631 Drosophi la melanogaster .. . 40 
4.6 

gi 1 9246422 1 gb I AF197137. 1 1 AF197137 Homo sapiens FN5 protein ... 40 
4.6 

gi|4190938lgb|AC000091.1|AC000091 Homo sapiens Chromosome 2. . . 40 
4.6 

gi|17431932|emblAL646085.1|AL646085 Ralstonia solanacearum . . . 40 
4.6 

gi 1 15073719 1 emb I AL591785. 1 1 SME591785 Sinorhizobium mel i lot i . . . 40 
4.6 

gi 1 3628578 1 gb I AC005115. 1 1 AC005115 Drosophi la melanogaster D. . . 40 
4.6 

gi 13150432 Igb I U50080. 1 ILSU50080 Lymnaea stagnal is serotonin. . . 40 
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4.6 

gi|8052359lemb|AL356592.1|SC9Hll Streptomyces coelicolor co. . . 40 
4.6 

gi 1 6624640 1 emb I AL034344. 24 IHS118B18 Human DNA sequence from. . . 40 
4.6 

gi 1 15528721 1 dbj I AP003296. 3 1 Oryza sat iva ( japonica cul t ivar. . . 40 
4.6 

gi 1 15289781 1 dbj I AP003141. 2 1 Oryza sat iva ( j aponica cul t ivar. . . 40 
4.6 

gi 1 6069643 1 dbj I AP000616.il Oryza sat iva (j^onica cult ivar-. . . 40 
4.6 

gi|960285lgb|L46862.1|RATLAMB2G Rattus norvegicus laminin B. . . 40 
4.6 

gi 1 198704 1 gb I J03749. 1 1 MUSLAMB2B Mouse laminin B2 gene, exon. . . 40 
4.6 

gi 1 198702 1 gb IJ02930. 1 IMUSLAMB2A Mouse laminin B2 chain mRNA. . . 40 
4.6 

gi i 198694 1 gb I J03484, 1 1 MUSLAM2B Mouse laminin B2 chain mRNA, ... 40 
4.6 

Al ignments 

>gi 110863872 1 ref I NM_000660. II Homo sapiens transforming growth factor, 
beta 1 (Camurati-Engelmann 

disease) (TGFBl), mRNA 
Length = 2745 

Score = 587 bits (296), Expect = e-165 
Identities = 356/377 (94%). Gaps = 1/377 (0%) 
Strand = Plus / Plus 
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Query: 246 cgagctggtcg^agaaga^nnnnnnncttttgagacttttccgttgccgctgggagcc 
305 

lllllllllllllllllllll llllllllllllllllllllllllllllllll 
Sbjct: 225 cgagctggtcgggagaagaggaaaaaaacttttgagacttttccgttgccgctgggagcc 
284 

Query: 306 ggaggcgcggggacctcttggcgcgacgctgccccgcgaggaggcaggacttggggaccc 
365 

llllllllllllllillillllllllllllllllllllllllillillllllllllllil 
Sbjct: 285 ggaggcgcggggacctcttggcgcgacgctgccccgcgaggaggcaggacttggggaccc 
344 

Query: 366 cagaccgcctccctttgccgcc^^acgcttgctccctccctgccccctacac^cgtc 

425 

lllllllllillllllllllilllllllllllilllllllilllllllllilllllllil 
Sb j ct : 345 cagaccgcctccct t tgccgccgg^acgct tgctccctccctgccccctacac^cgtc 
404 

Query: 426 cctcaggcgcccccattccggaccagccctc^gagtcgccgacccggcctctcgcaaag 
485 

llllillllllllllllilllllllllllilllllllllllllllllliill lllllll 
Sbjct: 405 cctcaggcgcccccattccggaccagccctcgggagtcgccgacccggcctcccgcaaag 
464 

Query: 486 acttttcaccatacctcgggcgcaccctctgcacgcggccttcatcaccggcctgtctac 
545 

lllllll III llllllillllllll llllllll lllllilll lllllllllll I 
Sbjct : 465 acttttccccagacctc^cgcaccccctgcacgccgccttcatcccc^cctgtctcc 
524 
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Query: 546 tgagcccccgcggatgcctagaccctttctcctccgggagacggatccctctccgacctg 
605 

llllllllllll II lllllllllllllllllll lllllllllll IIIMIIIIIII 
Sbjct: 525 tgagcccccgcgcat-cctagaccctttctcctccaggagacggatctctctccgacctg 
583 



Query: 606 ccgcaaattccctattc 622 

II II II llllllll 
Sbjct: 584 ccacagatcccctattc 600 
[0 0 8 3] 

Example 6: Statistical analysis of 5' end sequence tags 

5* end sequence tags obtained from the same plurality of mRNAs in a sa 
mple or nucleic acid fragments within the same cDNA library can be analy 
zed by a standard software solution like NCBI BLAST (http://www.ncbi.nlm 
.nih.gov/BLAST/) to identify non-redundant sequence tags as describe in 
Example 5. All such non-redundant sequence tags can then be individually 
counted and further analyzed for the contribution of each non-redundant 
tag to the total number of all tags obtained from the same sample. The 
contribution of an individual tag to the total number of all tags should 
allow for a quantification of the transcripts in a plurality of mRNAs i 
n the sample or a cDNA library. The results obtained in such a way on i 
ndividual samples can be further compared with similar data obtained fro 
m other samples to compare their expression patterns. , 
[0 0 8 4] 

Example 7: Mapping of 5' end sequence tags to genomic sequence informati 
on 

5* end specific sequence tags obtained as describe in this Example can 
be used to identify transcribed regions within genomes for which partia 
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1 or entire sequences were obtained. Such a search can be performed us in 
g standard software solutions like NCBI BLAST (http://www.ncbi. nlm.nih.g 
ov/BLAST/) to align the 5* end specific sequence tags to genomic sequenc 
es. In the case of large genomes like those from human, rat or mouse it 
may be necessary to extend the initial sequence information obtained fro 
m concatemers for exaniple by the approach describe in Example #. The use 
of extended sequences allows for a more precise identification of activ 
ely transcribed regions in the genome. 



Example 8: Identification of transcriptional start sites 

5* end specific sequence tags, which could be mapped to genomic sequen 
ces, allow for the identification of regulatory sequences. In a gene the 
DNA upstream of the 5' end of transcripted regions usually encompasses 
most of the regulatory elements, which are used in the control of gene e 
xpression. These regulatory sequences can be further analyzed for their 
functionality by searches in databases, which hold information on bindin 
g sites for transcription factors. Publicly available databases on trans 
cription factor binding sites and for promoter analysis include: 
Transcription Regulatory Region Database (IRRD) (http://wwwmgs.bionet.ns 
c. ru/mgs/dbases/trrd4/) 
TRANSFAC (http://transfac.gbf.de/TRANSFAC/) 
TFSEARCH (http://www. cbrc. jp/research/db/TFSEARCH. html) 
Promoter Inspector provide by Genomatix Software (http://www.genomatix.de 



[0 0 8 6] 

Example 9: Cloning of full-length cDNAs using information derived from 5 
end sequence tags 

Sequence information derived from the concatamers can be used to synth 



[0 0 8 5] 
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esis specific primers for the cloning of full-length cDNAs. In such an a 
pproach, the sequence derived from a given 5' end specific tag can be us 
ed to design a forward primer while the choice of the reverse primer wou 
Id be dependent on the template DNA used in the amplification reaction. 
Amplification by the polymerase chain reaction (PCR) can be performed us 
ing a template derived from a plurality of RNA obtained from a biologica 
1 sample and an oligo-dT primer. In the first step the oligo-dT primer a 
nd a reverse transcriptase are used to synthesis a cDNA pool. In the sec 
end step a forward primer derived from a 5' end specific tag and an olig 
o-dT primer are used to amplify a full-length cDNA from the cDNA pool. S 
imilarly, a specific full-length cDNA can be ainplified from an exiting c 
DNA library using a forward primer derived from a 5' end tag and a vecto 
r nested reversed primer. 
[0 0 8 7] 

Example 10: Alternative approaches for the cloning of 5' -end tags from c 
DNA libraries. 

A plurality of cDNAs can be ana)lified from an exciting cDNA library ha 
ving a recognition site for a class lis endonuclease at the 5* end of th 
e inserts. The PCR products derived from such a library would be further 
treated as described in the examples herein. 
[0 0 8 8] 

Example 11: Cloning of 5' ends by replacement of the Cap structure by an 

oligonucleotide having a class lis recognition site 
A cDNA/RNA hybrid encompassing the 5* end of an initial transcript can 

be obtained as described in Example 1. The Cap structure in such cDNA/R 
NA hybrids is then enzymatically removed by a hydrolyzing enzyme such as 

the T4 polynucleotide kinase or the tobacco acid pyrophosphatase. A sin 
gle or double stranded oligonucleotide having a class lis recognition si 
te is then ligated by T4 RNA ligase to the RNA at the phosphate present 
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at the 5' end of the de-capped mRNA. The ligated oligonucleotide will fu 
net ion as a primer for the second strand synthesis following the procedu 
re given in Example 1. By the use of a modified oligonucleotide in the 1 
igation step the double stranded cDNA can be attached to a support and u 
sed for the cloning of concatamers as described herein. 



Exaii5)le 12: Amplification step for a sample 

In cases where the amount of a sanple is limiting to the invention, th 
e sample material can be amplified by the following approach. In a first 
step a plurality of mRNAs is treated as described in Example 11 to repl 
ace the cap structure by an appropriate oligonucleotide having a class I 
Is recognition site. In a second step the aforementioned template is amp 
lif ied by a PGR step using a primer complementary to the linker and a po 
ly-A primer. The PGR product can be used for the invention as described 
in the Exang)les 1. 
[0 0 9 0] 

Exanple 13: Utilization of extended 5' -end sequences 

Initial 5' end sequences obtained for concatamers can be used to sjmth 
esis sequencing primers to obtain extended sequence information on the 5 
' end of a transripted region. 



Example 14: Gene inactivation 

Sequence information obtained from 5' end specific sequence tags can be 
used for the design of anti-sense probes, which could be applied in knoc 
kdown studies. 



By the present invention, novel means by which not only the informatio 
n on the nucleotide sequences of mRNAs contained in a sample may be obta 



[0 0 8 9] 



[0 0 9 11 



[0 0 9 2] 
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ined, but also novel genes could be cloned. By the method of the present 
invention, information on the nucleotide sequences of the 5' end region 
s of a plurality of nucleic acids said mRNAs and cDNAs in the sample cou 
Id effectively be obtained. Since the information on the nucleotide seq 
uences of the 5' end regions is obtained, unknown genes can be cloned af 
ter the identification of a novel transcript. Further, it may be possib 
le to attain mapping of transcription start sites, mapping of promoter u 
sage pattern, analysis of SNPs in promoters, creating gene networking by 
combining the expression analysis, alternative promoter usage and the o 
ther data in this disclosure, and selective recovery of promoter regions 
in fragmented genomic DNA. 
[0 0 9 3] 

In particular, the invention has a great impact on identification, clo 
ning and further analysis of promoter regions. After sequencing concatam 
er libraries holding information on a plurality of 5' ends, a statist ica 
1 analysis on the distribution on the transcriptional start sites will b 
e possible. Changes between different physiological conditions switch th 
e mRNA transcription machinery into new "status**. Such a "transcript iona 
1 status" can measured by coniputing (l) the presence of the transcriptio 
n starting points, (2) the digital expression of the various transcript i 
onal factors by counting their expression by counting the tags, and corr 
elating the presence of starting point, the transcription factors. More 
information will be obtained on the gene networking by comparing the per 
turbation of gene expression between two different conditions. Such con© 
arisons of transcriptional conditions between various disease and normal 
tissues could allow for the design of new and very comprehensive diagno 
Stic tools. Thus the invention will be of high commercial value in gene 
discovery and gene analysis, and it is envisioned that the invention wil 
1 be of use in the development of novel diagnostic and therapeutic produ 
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[0 0 9 4] 

[REFERENCES] 

Velculescu VE, Zhang L, Vogelstein B, Kinzler KW, Serial analysis of gen 
e expression. Science 1995 Oct 20 ; 270 (5235) : 484-7 
US patent 5, 866, 330 (SAGE) 
US patent 5, 695, 937 (SAGE) 

Piero CARNINCI et al., METHODS IN ENZYMOLOGY, VOL. 303, pp. 19-44. 1999 
Lee S, Clark T, Chen J, Zhou G, Scott LR, Rowley JD, Wang SM, Correct id 
entification of genes from serial analysis of gene expression tag sequen 

ces. Genomics 2002 Apr ; 79 (4) : 598-602 

Saha S, Sparks AB, Rago C, Akmaev V, Wang CJ, Vogelstein B, Kinzler KW, 
Velculescu VE, Using the transcriptome to annotate the gencwie, Nat Biote 
chnol 2002 May ; 20 (5) : 508-12 

Maruyama K and Sugano S, Oligo-capping: a simple method to replace the c 
ap structure of eukaryotic miRNAs with oligoribonucleotides. Gene. 1994, 
Vol. 138:171-4 

Edery I, Chu LL, Sonenberg N, Pel letter J, An efficient strategy to isol 
ate full-length cDNAs based on an mRNA cap retention procedure (CAPture) 
, Mol Cell Biol 1995 Jun; 15 (6) : 3363-71 
US patent 6, 022, 715 (GenSet) 

Shibata Y, Carninci P, Watahiki A, Shiraki T, Konno H, Muramatsu M, Haya 
shizaki Y, Cloning full-laigth, cap-trapper-selected cDNAs by using the 
single-strand linker ligation method, Biotechniques 2001 Jun; 30 (6) : 1250- 
4 

Sambrook J and Russel DW, Molecular Cloning A Laboratory Manual, Cold Sp 
ring Harbor Laboratory Press, New York, 2001 

Carninci P, Shibata Y, Hayatsu N, Itoh M, . Shiraki T, Hirozane T, Watahik 
i A, Shibata K, Konno H, Muramatsu M, Hasrashizaki Y, Balanced-size and 1 



tb«E#2 003-3072107 




2002-171851 



^-z^: 104/ 



ong-size cloning of full-length, cap-trapped cDNAs into vectors of the n 
ovel lambda-FLC family allows enhanced gene discovery rate and functiona 
1 analysis. Genomics. 2001 Sep ; 77 (1-2) : 79-90. 

Heinemeyer T, Wingender E, Renter I, Hermjakob H, Kel AE, Kel OV, Ignati 
eva EV, Ananko EA, Podkolodnaya OA, Kolpakov FA, Podkolodny NL, Kolchano 
V NA, Databases on transcriptional regulation: TRANSFAC, TRRD and COMPEL 

, Nucleic Acids Res 1998 Jan 1;26(1) :362-7 

Maruyama K, Sugano S. Oligo-capping: a single method to replace the cap 
structure of eukaryotic mRNAs with oligoribonucleotides. Gene. 1994 Jan 
28:138(1-2): 171-4. 

Jordan B. , DNA Microarrays: Gene Expression Applications, Springer-Verla 
g, Berlin Heidelberg New York, 2001 

Schena A, DNA Microarrays, A Practical Approadi, Oxford University Press 
, Oxford 1999 

US patent 5,962,272 (Clontech) 

Caminci P, Shiraki T, Mizuno Y, Muramatsu M, Hayashizaki Y, Extra-long 
first-strand cDNA synthesis, Biotechniques 2002 May; 32 (5) : 984-5 
US patents 6,352,828; 6,306,597; 6,280,935; 6,265.163; 5,695,934 (Lynx) 
4 Brief Description of Drawings 



An example for the preparation of a plurality of 1®^ strand cDNAs is pr 
esented, where the starting material can be RNA derived from a biologica 
1 sainple or a cDNA library. 



An example for the cloning of 5' -end specific tags into concatemers is 
presented. The example is including but not limited to the use of the r 
estriction enzymes Gsu I, Bgl II and Eco RI. 



An exanQ)le for a 1st linker to be used for the cloning of 5' -ead sped 
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fic tags is presented. Hie example is including but not limited to the u 
se of the restriction enzymes Bgl II, Gsu I, and Mme I. 
[114] 

An example for a 2^d linker to be used for the cloning of 5' -end speci 
fic di-tags is presented. The example is including but not limited to th 
e use of the restriction enzymes Bgl II, Gsu I, litoe I and Eco RI. 
[0 5] 

An example for the structure of a di-tag is presented. The example is i 
ncluding but not limited to the use of the restriction enzymes Bgl II, G 
su I, Mme I and Eco RI. 
IEI61 

An example for the use of a 5' -end specific linker is presented, in wh 
ich the linker is used for the enrichment of individual nucleic acids an 
d their sequencing. 
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Fig.6: Serial Sequencing 
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1 Abstract 

A method is disclosed to obtain the 5* ends of transcribed regions 
from a plurality of nucleic acid fragments obtained from biological mate 
rials or synthetic pools. DNA fragments encoding the 5' ends are enriche 
d for their individual analysis or for the analysis of concatamers there 
of. The sequence information derived from 5' ends can be used for charac 
terization and cloning of the transcriptome. 
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